Files do not contain only the body text or image that appears on the surface.
For a photo, the capture time or camera model. For a document, the author name or editing software. For a PDF, the creation time or document title. For a video, the capture device or location information.
This kind of "data about data" is called metadata.
Metadata is useful information. It is used for file management, search, editing, and organization.
However, from the perspective of anonymity, metadata becomes a clue that is easy to overlook.
Hidden Information Left in the File
Metadata is not the file content itself, but information about that file.
For example, photo metadata may include capture time, camera model, lens information, and GPS location information. Document files may retain the author name, company name, edit history, comments, and software used.
File
Information that may remain
Anonymity point to watch
Photo
Capture time, location information, camera model
Reveals a place or device
Video
Capture time, location information, device information
Connects to a filming location or routine places
PDF
Author, title, creation software
A personal name or organization name may remain
Office document
Author, company name, edit history, comments
Internal information or worker names may remain
Audio
Recording time, software information, tags
Becomes a clue to the recording environment or edit history
Metadata does not always stand out when a file is opened. That is why it is dangerous.
Even if you carefully blur the visible text, if an author name remains inside the file, anonymity weakens quickly.
Why Metadata Remains
Metadata is not inherently bad.
Photo apps organize photos by capture time and place. Document creation software manages authors and edit history. PDF creation tools record titles and creation software.
In other words, metadata exists to make work more convenient.
The problem is that this convenient information also remains when publishing or sharing.
Information that is not a problem when used only inside your own device becomes a clue to identity or organization when passed externally.
Metadata That Matters for Anonymity
The metadata that requires particular care for anonymity is information connected to a person, place, time, device, or organization.
Type
Example
What it connects to
Personal information
Author name, username, company name
Real name or affiliation
Time information
Creation time, capture time, update time
Activity time, work hours, movement history
Location information
GPS, capture location
Routine places, workplace, school, places visited
Device information
Camera model, smartphone model, software name
Device or environment used
Editing information
Comments, change history, layers
Internal work, people involved, information before revision
Each of these can look small by itself. However, when combined with post content, image backgrounds, accounts, and publication time, it becomes material that can support identity inference.
Removing Metadata Is Not the End
Removing metadata is important. However, that alone does not make things safe.
For example, even if you remove GPS information from a photo, the location can still be inferred if a station name, shop name, school name, internal notice, uniform, or car license plate appears in the background.
Even if you remove author information from a PDF, the candidate set can narrow if a department name, internal term, or background known only to a small number of people remains in the body text.
What to check
Where to look
Example
Metadata
Internal file information
Author, time, location information
Visible content
The image or body text itself
Background, signs, writing style, internal information
File name
Name used when sharing
Real name, case name, organization name
Sharing path
Where it was sent from
Cloud history, email, account
For anonymity, check metadata and content separately. Looking at only one side is not enough.
Order for Checking Before Publication
For metadata checks, deciding the order reduces oversights.
Look at the file name
Look at the visible content of the file
Check the metadata
Remove unnecessary metadata
Check the file again after removal
Check the sharing method and upload destination
What is especially important is checking again after removal.
Some software adds new metadata when saving or converting. Information may also change after editing an image, exporting a PDF, or uploading to the cloud.
It is also important not to confuse the checked file with the file you will publish.
If you check the original file and then save it under another name, export a new file after editing an image, convert it to PDF, or package it into an archive, check the final file for publication again.
Operation
Reason to check again
Save as
New author information may be added
PDF conversion
Creation software or a title may be added
Image editing
Editing software or save time may be added
Compression
Internal file names or folder names remain
Cloud upload
Owner information or history may be added
Mistakes to Avoid When Handling Metadata
When handling metadata, avoid the following mistakes.
Assuming it is safe after only changing the file name
Removing only image GPS and not looking at the background
Looking only at PDF properties and not checking comments or edit history
Not checking the converted file again
Not checking the history or account of a cloud sharing link
Taking a file created in a real-name environment directly into an anonymous environment
Metadata is not something you finish by deleting only one item. Check the file format, creation software, editing method, and sharing method as well.
Do Not Leave Everything to Tools
Metadata removal tools are useful. However, it is dangerous to think that a file is safe just because it passed through a tool.
A format may not be supported by the tool. Only some metadata may remain. Information may remain in the visible body text or background. Another application may add new information after removal.
For metadata measures, treat removal, rechecking, and content checking as a set. Detailed check methods and how to use ExifTool are covered in another article.
The Weight Changes Depending on the Recipient
Metadata risk changes depending on the recipient.
The strength of checking needed differs between a photo sent to family and materials sent as an anonymous tip or submission. It also differs between an image posted to a public social network and an image whose original file is shared through the cloud.
In high-risk situations, also consider extracting only the necessary information rather than handing over the original file as-is. Decide the strength of metadata checks according to the purpose and recipient.
Summary
Metadata is not the file content itself, but information about that file.
Photos, videos, PDFs, Office documents, and audio files may retain authors, times, location information, device information, edit history, and similar data.
Metadata is useful information, but for anonymity it becomes a clue connected to identity, place, time, organization, and device.
However, deleting metadata does not make everything safe. You also need to check image backgrounds, body text content, file names, sharing paths, and cloud history separately.
For anonymity, it is important to look at both "information left behind the file" and "information left on the surface."
Related tools
Metadata inspection
ExifTool
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.