Learn

284 articlesCategory: All
Metadata

Invisible Risks Left in PDFs

PDFs look safe because their appearance is fixed.

However, PDFs can retain information other than the body text shown on the screen.

Examples include creator names, creation applications, update times, annotations, embedded files, hidden text, and traces of the original document.

When sharing documents anonymously, it is dangerous to think, "It is fine because I converted it to PDF."

This article organizes invisible risks that remain in PDFs and what to check before publication.

PDFs Cannot Be Judged by Appearance Alone

PDF is a convenient format that can display documents like paper.

However, a PDF file can contain information other than the displayed pages.

InformationContentAnonymity caution
CreatorUser name or app information for the person who made the documentCan point toward the person or organization
Creation timeTime when it was created or updatedConnects to activity time or submission time
AnnotationsComments, markers, notesEditing process or related people become visible
Embedded itemsAttached files, fonts, imagesOriginal data or creation environment remains
Hidden textOCR, text for copyingInformation you thought you blacked out may remain

Even if a PDF looks clean, the internal information is not necessarily clean.

For anonymity, check the displayed pages separately from the information inside the file.

What Disappears and What Remains When Converting to PDF

When Office documents or images are converted to PDF, some information changes.

However, not all risks disappear.

What changes through conversionWhat may remain
An editable document becomes a fixed displayCreator, creation app, creation time
It is embedded as an imageText inside the image, background, reflections
Fonts and layout are fixedEmbedded fonts or app information
Comments may disappearAnnotations or change history may remain in another form
It looks like redaction was appliedThe underlying text may remain

Converting to PDF is useful in some situations.

However, "made it a PDF" and "finished checking anonymity" are different things.

Redaction and Hidden Text

One especially dangerous PDF case is hiding information as if it were redacted.

If you only place a black rectangle on top, the text underneath may remain inside the file.

Even if it cannot be read visually, the original characters may be visible through copying, searching, extraction, or internal analysis.

MethodRisk
Overlay a black shapeThe underlying text may remain
Make text the same color as the backgroundIt may be visible through copying or search
Turn it into a screenshotImage quality degradation, OCR-readable text, and background information remain
Use a dedicated redaction featureRechecking after processing is necessary

For documents that require redaction, use a dedicated feature, then search, copy, and check metadata after processing.

For high-risk documents, also consider consulting a specialist or trusted support contact instead of judging only from an article.

Organization Information Left in PDFs

For whistleblowing and reporting materials, organization information inside PDFs becomes a major problem.

Not only creator names, but also templates, department names, file paths, annotations, distribution numbers, watermarks, and page-number formats become clues.

ClueWhat can be learned
Creator nameDocument creator or device account
Company nameAffiliated organization or creation environment
TemplateDepartment or workflow
AnnotatorPeople involved in editing
Watermark / distribution numberDistribution destination or source of the material

A PDF looks like a "final version."

However, precisely because it is a final version, traces of having been made inside an organization may remain.

Why This Is Especially Dangerous for Whistleblowing and Reporting Materials

For whistleblowing and reporting materials, information inside a PDF can show "who could have known this material."

Even if the document itself does not contain a name, if distribution scope, update time, annotations, watermarks, page numbers, document numbers, or department-specific templates remain, candidates narrow.

Clue inside the PDFWhat is inferred
Distribution numberWhich department or person received the material
WatermarkInformation for identifying the viewer or distribution destination
Annotator namePerson who reviewed it or related people
Update timeWho could have worked at that time
TemplateOrganization, department, or workflow

Even if this kind of information has no meaning to ordinary readers, it can be a strong clue for people inside the organization.

For anonymity, consider not only whether it is visible to people who do not know, but also how it looks to people who do know.

Be Careful With Text Extraction From PDFs

Even if a PDF looks like an image visually, it may contain text internally.

In OCR-processed PDFs, search text may be placed behind the scanned image.

Even if redaction or blurring is performed on the image, it is dangerous if the original characters remain in the search text.

StateWhat to check
Scanned PDFWhether OCR text is included
Redacted PDFWhether original characters appear through copying or search
PDF with imagesWhether text or backgrounds inside images remain
PDF with annotationsWhether annotation text or comments can be extracted
Form PDFWhether input fields or selected states remain

Before publishing a PDF, do not only read what it looks like. Search, copy, and check metadata.

"Not visible on the screen" is not proof of safety.

Tools That Can Be Used for Checking

If you use online services for PDF checking, conversion, redaction, or metadata removal, the online service may receive the original file, filename, internal information, and access-origin information. For high-risk documents, do not upload them to external services. Check them in a local environment, and recheck with another method after processing.

ExifTool is sometimes used to check PDF metadata.

URL: https://exiftool.org/

qpdf is also a candidate for checking PDF structure and conversion. qpdf is a tool used to check and convert PDF file structure, and the official documentation explains how to use it.

URL: https://qpdf.readthedocs.io/

However, using tools does not automatically make a file safe.

You need to read the displayed information and recheck after deletion or regeneration.

Checks Before Publication

Before publishing a PDF, check in the following order.

OrderWhat to checkReason
1Look at creator and creation timeCheck whether they connect to the person or work time
2Look at annotations and commentsCheck whether editing process or related people remain
3Check redacted partsSee whether underlying characters remain
4Look at embedded files and imagesCheck whether original data or separate files remain
5Look at the filenameCheck whether names, departments, or case names remain
6Recheck after deletionConfirm whether the processing succeeded

PDFs are a format that is easy to copy after publication.

Checking before publication is important.

Deciding Not to Publish a PDF

For high-risk PDFs, deletion and conversion alone may not be enough.

The content of the material itself may narrow the information source.

For example, if a timeline known only to participants in a certain meeting, abbreviations used only by a specific department, or notation that differs by distribution destination remains, candidates narrow even if metadata is removed.

In this case, instead of publishing the PDF as-is, you need to consider choices such as summarizing the content, generalizing proper names, having only the necessary portion checked by a trusted adviser, or bringing it to a specialist or support contact.

For anonymity, cleaning up a file and being able to publish it are different things.

Summary

PDFs look safe because their appearance is fixed, but internally they can retain creators, creation times, annotations, embeds, and hidden text.

Converting to PDF alone does not finish the anonymity check.

Pay particular attention to redaction, annotations, organization information, distribution numbers, and filenames.

Tools such as ExifTool and qpdf are useful for checking, but safety is not determined by tool names alone.

Before publication, check appearance, internal information, filename, and rechecking after deletion as a set.

Related tools

Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site
PDF inspection

qpdf

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://qpdf.readthedocs.io/

Open external site

Related articles