However, PDFs can retain information other than the body text shown on the screen.
Examples include creator names, creation applications, update times, annotations, embedded files, hidden text, and traces of the original document.
When sharing documents anonymously, it is dangerous to think, "It is fine because I converted it to PDF."
This article organizes invisible risks that remain in PDFs and what to check before publication.
PDFs Cannot Be Judged by Appearance Alone
PDF is a convenient format that can display documents like paper.
However, a PDF file can contain information other than the displayed pages.
Information
Content
Anonymity caution
Creator
User name or app information for the person who made the document
Can point toward the person or organization
Creation time
Time when it was created or updated
Connects to activity time or submission time
Annotations
Comments, markers, notes
Editing process or related people become visible
Embedded items
Attached files, fonts, images
Original data or creation environment remains
Hidden text
OCR, text for copying
Information you thought you blacked out may remain
Even if a PDF looks clean, the internal information is not necessarily clean.
For anonymity, check the displayed pages separately from the information inside the file.
What Disappears and What Remains When Converting to PDF
When Office documents or images are converted to PDF, some information changes.
However, not all risks disappear.
What changes through conversion
What may remain
An editable document becomes a fixed display
Creator, creation app, creation time
It is embedded as an image
Text inside the image, background, reflections
Fonts and layout are fixed
Embedded fonts or app information
Comments may disappear
Annotations or change history may remain in another form
It looks like redaction was applied
The underlying text may remain
Converting to PDF is useful in some situations.
However, "made it a PDF" and "finished checking anonymity" are different things.
Redaction and Hidden Text
One especially dangerous PDF case is hiding information as if it were redacted.
If you only place a black rectangle on top, the text underneath may remain inside the file.
Even if it cannot be read visually, the original characters may be visible through copying, searching, extraction, or internal analysis.
Method
Risk
Overlay a black shape
The underlying text may remain
Make text the same color as the background
It may be visible through copying or search
Turn it into a screenshot
Image quality degradation, OCR-readable text, and background information remain
Use a dedicated redaction feature
Rechecking after processing is necessary
For documents that require redaction, use a dedicated feature, then search, copy, and check metadata after processing.
For high-risk documents, also consider consulting a specialist or trusted support contact instead of judging only from an article.
Organization Information Left in PDFs
For whistleblowing and reporting materials, organization information inside PDFs becomes a major problem.
Not only creator names, but also templates, department names, file paths, annotations, distribution numbers, watermarks, and page-number formats become clues.
Clue
What can be learned
Creator name
Document creator or device account
Company name
Affiliated organization or creation environment
Template
Department or workflow
Annotator
People involved in editing
Watermark / distribution number
Distribution destination or source of the material
A PDF looks like a "final version."
However, precisely because it is a final version, traces of having been made inside an organization may remain.
Why This Is Especially Dangerous for Whistleblowing and Reporting Materials
For whistleblowing and reporting materials, information inside a PDF can show "who could have known this material."
Even if the document itself does not contain a name, if distribution scope, update time, annotations, watermarks, page numbers, document numbers, or department-specific templates remain, candidates narrow.
Clue inside the PDF
What is inferred
Distribution number
Which department or person received the material
Watermark
Information for identifying the viewer or distribution destination
Annotator name
Person who reviewed it or related people
Update time
Who could have worked at that time
Template
Organization, department, or workflow
Even if this kind of information has no meaning to ordinary readers, it can be a strong clue for people inside the organization.
For anonymity, consider not only whether it is visible to people who do not know, but also how it looks to people who do know.
Be Careful With Text Extraction From PDFs
Even if a PDF looks like an image visually, it may contain text internally.
In OCR-processed PDFs, search text may be placed behind the scanned image.
Even if redaction or blurring is performed on the image, it is dangerous if the original characters remain in the search text.
State
What to check
Scanned PDF
Whether OCR text is included
Redacted PDF
Whether original characters appear through copying or search
PDF with images
Whether text or backgrounds inside images remain
PDF with annotations
Whether annotation text or comments can be extracted
Form PDF
Whether input fields or selected states remain
Before publishing a PDF, do not only read what it looks like. Search, copy, and check metadata.
"Not visible on the screen" is not proof of safety.
Tools That Can Be Used for Checking
If you use online services for PDF checking, conversion, redaction, or metadata removal, the online service may receive the original file, filename, internal information, and access-origin information. For high-risk documents, do not upload them to external services. Check them in a local environment, and recheck with another method after processing.
qpdf is also a candidate for checking PDF structure and conversion. qpdf is a tool used to check and convert PDF file structure, and the official documentation explains how to use it.
However, using tools does not automatically make a file safe.
You need to read the displayed information and recheck after deletion or regeneration.
Checks Before Publication
Before publishing a PDF, check in the following order.
Order
What to check
Reason
1
Look at creator and creation time
Check whether they connect to the person or work time
2
Look at annotations and comments
Check whether editing process or related people remain
3
Check redacted parts
See whether underlying characters remain
4
Look at embedded files and images
Check whether original data or separate files remain
5
Look at the filename
Check whether names, departments, or case names remain
6
Recheck after deletion
Confirm whether the processing succeeded
PDFs are a format that is easy to copy after publication.
Checking before publication is important.
Deciding Not to Publish a PDF
For high-risk PDFs, deletion and conversion alone may not be enough.
The content of the material itself may narrow the information source.
For example, if a timeline known only to participants in a certain meeting, abbreviations used only by a specific department, or notation that differs by distribution destination remains, candidates narrow even if metadata is removed.
In this case, instead of publishing the PDF as-is, you need to consider choices such as summarizing the content, generalizing proper names, having only the necessary portion checked by a trusted adviser, or bringing it to a specialist or support contact.
For anonymity, cleaning up a file and being able to publish it are different things.
Summary
PDFs look safe because their appearance is fixed, but internally they can retain creators, creation times, annotations, embeds, and hidden text.
Converting to PDF alone does not finish the anonymity check.
Pay particular attention to redaction, annotations, organization information, distribution numbers, and filenames.
Tools such as ExifTool and qpdf are useful for checking, but safety is not determined by tool names alone.
Before publication, check appearance, internal information, filename, and rechecking after deletion as a set.
Related tools
Metadata inspection
ExifTool
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.