Image information is well known. However, metadata that matters for anonymity is not limited to images.
PDFs, Office documents, videos, audio files, and compressed files may also retain author names, edit histories, software used, capture dates and times, tag information, comments, internal filenames, and similar information.
Even if you think you removed something visually, other information may remain inside the file. Also, even if you remove metadata, identity or affiliation may still be inferred from the text, background, audio, text visible on screen, or filename.
This article organizes information that tends to remain by file format and what to check before publication.
Metadata is information around a file
Metadata is information attached to the file itself.
Separate from document text or the image itself, it may include author, creation date and time, editing software, location information, change history, comments, and similar information.
In anonymity work, metadata becomes a strong clue. That is because even when a name is not written in the text, an author name or organization name may remain inside the file.
Pay attention to handling of redaction and annotations
Office documents
Author, company name, change history, comments
Hidden sheets and review information may remain
Video
Capture date and time, device information, location information, editing software
Audio and backgrounds are also clues
Audio
ID3 tags, recording date and time, app information
Check voices and background sounds too
Compressed files
Internal filenames, folder names, unneeded files
In-progress work data may be mixed in
Metadata appears differently depending on the file format. For that reason, it is better not to decide everything is safe from one checking method.
Information to watch in PDFs
PDF is a format that makes documents easy to distribute. However, caution is needed when publishing materials anonymously.
PDFs may retain author names, creation dates and times, creation software, titles, annotations, embedded files, form information, and similar data.
Also, even if something appears to be redacted visually, the original text information may remain inside. If you only place a black rectangle as an image, the original text may be extractable by copying or searching.
What to check
Reason
Document properties
Author name and creation software may remain
Annotations
Comments and editing notes may be visible
Embedded files
Original materials or unneeded data may be included
Redaction processing
Check whether original text remains internally
Filename
Case names, organization names, or real names may be included
PDFs look like publication-ready files, but their internal structure is not necessarily safe. For high-risk materials, check with local tools and regenerate in a safer form if needed.
Information to watch in Office documents
Office documents such as Word, Excel, and PowerPoint are formats where work history tends to remain.
They may include author names, company names, change history, comments, hidden sheets, templates, links, collaborative editing history, and similar information.
For whistleblowing or reporting materials, handing over Office documents as-is requires particular care. That is because not only the document content, but also who created it and what organizational environment edited it may remain.
What to check
Visible information
Author
Personal name or account name
Company name
Affiliated organization or device settings
Change history
Who edited which parts
Comments
Internal conversation or traces of judgment
Hidden sheets
Data that is not displayed
Links
Internal paths or cloud URLs
For Office documents, it is important not to judge only by the visible page. When preparing a publication version, delete unnecessary history, export to another format, and then check again.
Information to watch in video
With video, you need to pay attention to both metadata and content.
Inside the file, the date and time taken, device information, location information, editing software, and encoding information may remain. In addition, the video itself may show faces, backgrounds, signs, uniforms, vehicles, window reflections, notifications, and account names on screen.
Because video contains more information than still images, checking anonymity is also harder.
What to check
Reason
Capture date and time
Connects to behavior time or on-site participation
Location information
Reveals the shooting location
Device information
Shows the device model or editing environment
Background
Buildings, signs, and scenery reveal location
Audio
Voices, ambient sound, and conversations become clues
Notifications
Account names or contacts may be visible
With video, checking the content remains necessary even if metadata is removed. Even in a short video, a few seconds of background or sound can reveal a place or person.
Information to watch in audio
Audio files also have metadata.
MP3 and similar files may contain tag information such as title, artist, album, creation software, and comments. Information about the recording app or editing software may also remain.
Also, with audio, the content itself is a strong clue. People or places may be inferred from voice, speaking style, dialect, background sound, notification sounds, station announcements, car sounds, room echo, and similar details.
What to check
Reason
ID3 tags
Title or author information may remain
Recording date and time
Can be compared with behavior time
Creation app
Becomes a clue to the usage environment
Voice
Reveals you or people involved
Background sound
Reveals place or situation
When publishing audio anonymously, metadata removal is not enough. Check voice quality, speaking style, and background sounds too.
Information to watch in compressed files
Compressed files such as ZIP files are used when handing over multiple files together.
With compressed files, pay attention to internal filenames, folder names, unneeded files, and in-progress work data.
For example, even if you intended to include only the PDF you want to publish, the same folder may also include the original Office document, notes, screenshots, or hidden files. Folder names may also include a real name, case name, or company name.
Always check the contents of a compressed file when you open it.
Basic checking flow
The checking flow is common regardless of file format.
Stage
What to check
1
Do not publish the original file directly
2
Make a publication copy
3
Check the filename
4
Check metadata
5
Check text, images, audio, and backgrounds
6
Recheck after deletion or conversion
7
Check how the receiving service will display it to the recipient
Tools such as ExifTool are used to check metadata across multiple formats.
ExifTool is a widely used local tool that can check and edit metadata in multiple formats, including images, videos, and documents. When learning differences between file formats, it also becomes an entry point for checking what information is actually inside. URL : https://exiftool.org/
However, do not stop at tool results alone. Tools help check internal information, but you still need to separately check image backgrounds, document text, audio content, and owner displays at upload destinations.
Summary
Metadata is not only an image problem.
PDFs, Office documents, videos, audio files, and compressed files may also retain authors, edit histories, creation software, tags, location information, internal filenames, and similar information.
In anonymity work, check file content and metadata separately. Even if you remove metadata, identity or affiliation may be inferred from text, backgrounds, audio, reflections, filenames, and sharing settings.
Before publication, do not send the original file directly. Make a publication copy, check by format, and recheck after removal. You need to think of handing over a file as possibly handing over not only the text, but also the creation environment and sharing route.
Related tools
Archive check
Wayback Machine
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.