PDFs are often used as public materials or submission materials.
However, a PDF cannot be judged by appearance alone. Even if the body text does not include a name, the file may still retain an author name, creation software, creation time, annotations, embedded files, and form information internally.
For anonymity, not only the PDF contents but also the environment where the PDF was made and its editing history can become clues.
This article organizes information that tends to remain in PDFs and points to check before publication.
Information Left in PDFs
A PDF may contain information other than the document itself.
Information
What it indicates
Anonymity risk
Author
The username in the OS or document software
A personal name or organization name appears
Creation time
When it was made
Connects to activity time or the period when the material was prepared
Creation software
Word, LibreOffice, a scanner, and similar tools
Becomes a clue to the work environment
Title
Original document name or case name
Internal names remain
Annotations
Comments or editing notes
People involved or the decision process appears
Embedded files
Source materials or attached data
Unnecessary information gets mixed in
A PDF looks like a "final version."
However, information from the creation process may remain inside it.
Redaction Failures
One especially dangerous issue with PDFs is redaction failure.
Even if something appears hidden by a black rectangle, text information may remain inside. The original text may be extractable by copying, searching, or disabling layers.
Common handling
Problem
Safer way to think
Placing a black shape on top
The original text remains internally
Use a dedicated redaction feature
Only turning it into a screenshot
Watch image quality and hidden information
Recheck the necessary range
Hiding with annotations
Annotations may be removable
Copy and search after output
Deleting pages
Embedded data or history may remain
Regenerate it as a separate file
Deleting some parts manually
Oversights happen
Turn the check into a checklist
For high-risk materials, it is safer not to judge redaction with only one person's eyes.
In legal, reporting, or whistleblowing contexts, review by a specialist or trusted adviser may be necessary.
Separate Appearance From Internal Information
When checking a PDF, separate appearance from internal information.
The appearance includes body text, images, tables, QR codes, page numbers, backgrounds, and watermarks. The inside includes metadata, annotations, embedded files, forms, and links.
Where to check
Information to look at
Reason
Appearance
Body text, images, tables, backgrounds
Check direct personal information
Links
URLs, sharing destinations, tracking
Avoid personal IDs and internal URLs
Metadata
Author, creation software
Do not expose the work environment
Annotations
Comments, review notes
Do not leave internal conversations
Embedded content
Attachments, forms
Do not include unnecessary data
A PDF looks like printed material.
In reality, however, it is a digital file with an internal structure.
Checking Steps
Before publishing a PDF, the basic rule is not to publish the original file directly.
Create a publication copy and leave only the necessary information.
Step
What to check
1
Create a publication copy instead of using the original file
2
Check whether the filename includes a real name, case name, or organization name
3
Check the document properties
4
Check annotations, comments, forms, and embedded files
5
Check whether redacted areas can be copied or searched
6
Open it again in a different environment and check the display
7
After uploading, check how it appears to the recipient
Tools such as ExifTool and qpdf are useful for checking.
However, even if a tool removes metadata, separately check information that remains in the body text or images. If you upload the original PDF to an external PDF conversion service, online redaction service, or online metadata checking service, the document contents, access information, and processing time may be passed to that service. For high-risk PDFs, perform checking and conversion in a local environment as much as possible.
Check the Recipient of the PDF Too
PDF risk is not only inside the file.
Where you upload it, who you send it to, and which account you use for sharing also affect anonymity. If you share from a personal cloud account, the owner name or email address may be visible. If you send it by email, the sender, subject, and time remain.
Sharing method
Information left
Caution
Cloud link
Owner name, sharing history
Do not share from a real-name account
Email attachment
Sender, subject, time
Look at contact-path correlation
Posting site
Upload time, account
Connects to post content
Chat sending
Remains on the recipient's device
Watch screenshots and forwarding
Anonymous submission
Submission destination logs, check time
Look at the trustworthiness of the submission destination
Making a PDF safer does not only mean cleaning up the inside of the PDF.
Think also about the sharing path, account, sending time, and storage on the recipient side.
Scope Compared With Other Articles
This article covers risks from information left in PDFs.
The flow for actually removing PDF metadata is covered in "Cautions When Removing PDF Metadata." When creating a PDF from an Office document, also check author information and revision history on the original Office file side.
In other words, it is important not to stop after looking only at the PDF.
Target to inspect
Main checks
PDF itself
Author, annotations, embedded files, redaction
Original Office file
Revision history, comments, company name
Images
Backgrounds, reflections, text,
Sharing path
Owner name, URL, sending time
Consultation contact
Handling of evidentiary value and safety
PDF is a "convenient format for submission" in many situations.
That is exactly why people often submit it as-is in whistleblowing, school or workplace consultations, and reporting tips. If anonymity matters, separately check where information about the person who made, edited, shared, and opened the PDF remains.
Be Careful With Received PDFs Too
PDF risk is not limited to files you made yourself.
A PDF received from someone else may also retain an author, creation time, creation software, annotations, and embedded files. If you publish a PDF received for reporting or consultation as-is, you may create a path back to the provider.
Information in a received PDF
Risk
Author
The provider or organization becomes visible
Creation time
The time when the material was prepared becomes visible
Annotations
Internal notes remain
Embedded files
Source materials get mixed in
Filename
Case names or personal names are visible
For received PDFs especially, check them from the perspective of protecting the provider.
Scanned PDFs Still Contain Clues
It is not safe just because you scanned paper and made it into a PDF.
A scanned PDF may retain the scanner name, creation software, and creation time. In addition, the page itself can retain seals, reception numbers, handwriting, folds, margin notes, and characteristics of the copier. Even an image PDF with no text information can produce clues from its appearance.
Clue
What it reveals
Caution
Scanner information
Equipment or environment used
Check metadata
Reception number
Internal organizational material
The numbering system can narrow candidates
Handwriting
The person who wrote it
People who know them may recognize it
Seal or signature
Person or organization
Becomes direct identifying information
Margins and folds
How it was handled
The path of the original can be inferred
A scanned PDF has risks from both digital documents and paper documents.
Check not only metadata, but also the enlarged appearance of the page.
Summary
A PDF cannot be judged safe by appearance alone.
Author name, creation time, creation software, annotations, embedded files, and form information may remain.
Redaction is also insufficient if judged only by appearance. Handling that leaves the original text internally is dangerous.
For anonymity, check the PDF body text, appearance, internal information, filename, and sharing destination separately.
For high-risk materials, also consider using a trusted adviser or specialist instead of judging alone.
Related tools
Metadata inspection
ExifTool
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.