It is insufficient to think that a PDF becomes safe if you remove its metadata.
Metadata removal is important. However, PDFs contain more than metadata: body text, images, annotations, links, redaction, embedded files, filenames, and sharing settings. Even if only metadata is removed, information leading back to a person or organization may remain elsewhere.
This article organizes how to think about PDF metadata removal and what to check after removal.
Make a copy before removal
The first thing to do is not to touch the original file directly.
Create a publication copy, and perform removal or conversion on that copy. The original file may need to retain editing history or evidentiary value. In consultation, whistleblowing, or legal procedures in particular, think carefully about how to handle the original file.
Target
Handling
Reason
Original file
Store in a safe place
For evidentiary value and rechecking
Publication copy
Remove metadata or convert
To prepare it for publication
Working folder
Avoid personal names and case names
Reduces leakage from file paths
Sharing destination
Keep to the minimum necessary
Avoids unnecessary spread
Record
Keep what was removed
Makes later checking possible
Metadata removal is one part of the work.
Think about it together with original file management, the publication copy, and checking the sharing destination.
Information to remove
There are several kinds of information that should be removed or checked in a PDF.
Information
Risk
How to check
Author
Personal name or OS username appears
View document properties
Company name
Affiliated organization is visible
Check properties and original Office file
Creation time
Connects to activity time
View properties
Annotations
Comments and internal conversations appear
View annotation list
Embedded files
Source materials are mixed in
Check attachments and structure
Forms
Input information remains
Check form fields
If the PDF was created from an Office document, also pay attention to the original Office side.
Even if you look only at the PDF, author information or change history may remain in the original file.
Recheck after removal
After using a removal tool, always recheck.
Even if you intended to remove metadata, information may remain in annotations, links, images, body text, or redacted areas.
What to check
Reason
Document properties
See whether author and title are gone
Text search
See whether hidden words appear in search
Copy operation
See whether redacted text can be copied
Annotation list
See whether comments remain
Links
See whether personal IDs or internal URLs remain
Filename
See whether a real name or organization name is included
It is important not to feel safe just because removal was performed.
Open the file to be published again in a separate environment and check it.
Tools are helpers
Tools such as qpdf, ExifTool, and mat2 may be used for PDF checking and conversion.
ExifTool is a representative tool used to check metadata across multiple formats.
Tools help check and remove internal information, but information written in body text, backgrounds in images, notifications in screenshots, and owner names on sharing links must be checked separately.
What tools can do
What tools alone cannot cover
Check metadata
Proper nouns in text
Remove some metadata
Safety of redaction
Check file structure
Backgrounds and reflections in images
Format conversion
Logs and owner display at the sharing destination
Automation
High-risk legal judgment
For high-risk documents, do not make the publication decision based only on tool output.
Some information should not be removed
If you look only at anonymity, you may want to remove all information.
However, in victim consultation, whistleblowing, and legal consultation, some information should be preserved as evidence. The original file's creation time, editing history, and transmission route may become important later.
Information
In publication copy
In consultation/evidence
Creation time
Consider blurring or removing
May become a necessary record
Author
Remove before publication
May be retained in the original
Change history
Remove from publication copy
May be needed to explain the sequence
Annotations
Remove from publication copy
May record internal confirmation
Original file
Do not publish
Store in a safe place
This is why the publication file and the stored original are separated.
Deletion necessary for protecting anonymity and preservation necessary for protecting evidence must be considered at the same time.
Name and storage location after removal
Even if metadata is removed, filenames and storage locations may reveal information.
For example, a filename may contain a real name, company name, or case name. A cloud sharing link may display the owner name of a real-name account. A working folder name may appear in a screenshot. Check this information separately from metadata removal.
What to check
Reason
Filename
Avoid real names, organization names, and case names
Folder name
May appear during screen sharing or compression
Cloud owner
Connects to a real-name account
Sharing URL
May contain personal IDs or tracking
Sending time
Compared with activity records
At the end of removal work, check the state from a third party's point of view.
Even if it looks safe on your own device, the recipient may see the owner name, preview, link card, or filename. If possible, check the final public state in another browser or another environment.
When checking after removal, it is important not to look only at metadata.
Also check proper nouns in the body text, image backgrounds, linked destinations, filenames, and the display name of the upload destination. Removal work must be paired with checking the final publication as a whole.
Record the removal work
For high-risk documents, recording what was removed and what was left can make later checking easier.
However, the record itself also needs caution. If it is stored in a real-name cloud account or on a workplace device, it becomes a clue connecting you with the material being removed.
What to record
Notes
Removed items
Do not include in the publication file
Items left
Be able to explain why they were left
Storage location of original file
Limit to a safe place
Publication filename
Do not include personal information
Time checked
Be careful about correlation with activity records
Removal work is part of the publication decision.
Verify after removal with another method
Metadata removal does not end just because it was executed.
Check with a method different from the tool used for removal. View document properties, select and copy text, open in another PDF viewer, and try downloading in another environment. Using several perspectives makes it easier to notice missed removals or display differences.
Verification method
Problems it can find
View document properties
Author, title, creation software
Select and copy text
Text under redaction, hidden text
Open in another viewer
Remaining annotations or forms
Check in another environment
Owner names and sharing display
Reread body text
Proper nouns and contextual leaks
In removal work, prioritize the final state that will be published over the tool's success message.
In anonymity, "I thought I removed it" is the most dangerous state. After removing, check how it appears in the recipient's environment.
Summary
PDF metadata removal is important, but it does not make the file safe by itself.
Do not publish the original file directly. Create a publication copy and check author, company name, creation time, annotations, embeddings, forms, and links.
After removal, recheck document properties, search, copy operations, annotations, and filename.
Tools are helpers.
Body text, images, redaction, sharing destinations, and legal risks must be checked separately.
Related tools
Archive check
Wayback Machine
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.