Learn

284 articlesCategory: All
Metadata

PDF Metadata Risks

PDFs are often used as public materials or submission materials.

However, a PDF cannot be judged by appearance alone. Even if the body text does not include a name, the file may still retain an author name, creation software, creation time, annotations, embedded files, and form information internally.

For anonymity, not only the PDF contents but also the environment where the PDF was made and its editing history can become clues.

This article organizes information that tends to remain in PDFs and points to check before publication.

Information Left in PDFs

A PDF may contain information other than the document itself.

InformationWhat it indicatesAnonymity risk
AuthorThe username in the OS or document softwareA personal name or organization name appears
Creation timeWhen it was madeConnects to activity time or the period when the material was prepared
Creation softwareWord, LibreOffice, a scanner, and similar toolsBecomes a clue to the work environment
TitleOriginal document name or case nameInternal names remain
AnnotationsComments or editing notesPeople involved or the decision process appears
Embedded filesSource materials or attached dataUnnecessary information gets mixed in

A PDF looks like a "final version."

However, information from the creation process may remain inside it.

Redaction Failures

One especially dangerous issue with PDFs is redaction failure.

Even if something appears hidden by a black rectangle, text information may remain inside. The original text may be extractable by copying, searching, or disabling layers.

Common handlingProblemSafer way to think
Placing a black shape on topThe original text remains internallyUse a dedicated redaction feature
Only turning it into a screenshotWatch image quality and hidden informationRecheck the necessary range
Hiding with annotationsAnnotations may be removableCopy and search after output
Deleting pagesEmbedded data or history may remainRegenerate it as a separate file
Deleting some parts manuallyOversights happenTurn the check into a checklist

For high-risk materials, it is safer not to judge redaction with only one person's eyes.

In legal, reporting, or whistleblowing contexts, review by a specialist or trusted adviser may be necessary.

Separate Appearance From Internal Information

When checking a PDF, separate appearance from internal information.

The appearance includes body text, images, tables, QR codes, page numbers, backgrounds, and watermarks. The inside includes metadata, annotations, embedded files, forms, and links.

Where to checkInformation to look atReason
AppearanceBody text, images, tables, backgroundsCheck direct personal information
LinksURLs, sharing destinations, trackingAvoid personal IDs and internal URLs
MetadataAuthor, creation softwareDo not expose the work environment
AnnotationsComments, review notesDo not leave internal conversations
Embedded contentAttachments, formsDo not include unnecessary data

A PDF looks like printed material.

In reality, however, it is a digital file with an internal structure.

Checking Steps

Before publishing a PDF, the basic rule is not to publish the original file directly.

Create a publication copy and leave only the necessary information.

StepWhat to check
1Create a publication copy instead of using the original file
2Check whether the filename includes a real name, case name, or organization name
3Check the document properties
4Check annotations, comments, forms, and embedded files
5Check whether redacted areas can be copied or searched
6Open it again in a different environment and check the display
7After uploading, check how it appears to the recipient

Tools such as ExifTool and qpdf are useful for checking.

However, even if a tool removes metadata, separately check information that remains in the body text or images. If you upload the original PDF to an external PDF conversion service, online redaction service, or online metadata checking service, the document contents, access information, and processing time may be passed to that service. For high-risk PDFs, perform checking and conversion in a local environment as much as possible.

Check the Recipient of the PDF Too

PDF risk is not only inside the file.

Where you upload it, who you send it to, and which account you use for sharing also affect anonymity. If you share from a personal cloud account, the owner name or email address may be visible. If you send it by email, the sender, subject, and time remain.

Sharing methodInformation leftCaution
Cloud linkOwner name, sharing historyDo not share from a real-name account
Email attachmentSender, subject, timeLook at contact-path correlation
Posting siteUpload time, accountConnects to post content
Chat sendingRemains on the recipient's deviceWatch screenshots and forwarding
Anonymous submissionSubmission destination logs, check timeLook at the trustworthiness of the submission destination

Making a PDF safer does not only mean cleaning up the inside of the PDF.

Think also about the sharing path, account, sending time, and storage on the recipient side.

Scope Compared With Other Articles

This article covers risks from information left in PDFs.

The flow for actually removing PDF metadata is covered in "Cautions When Removing PDF Metadata." When creating a PDF from an Office document, also check author information and revision history on the original Office file side.

In other words, it is important not to stop after looking only at the PDF.

Target to inspectMain checks
PDF itselfAuthor, annotations, embedded files, redaction
Original Office fileRevision history, comments, company name
ImagesBackgrounds, reflections, text,
Sharing pathOwner name, URL, sending time
Consultation contactHandling of evidentiary value and safety

PDF is a "convenient format for submission" in many situations.

That is exactly why people often submit it as-is in whistleblowing, school or workplace consultations, and reporting tips. If anonymity matters, separately check where information about the person who made, edited, shared, and opened the PDF remains.

Be Careful With Received PDFs Too

PDF risk is not limited to files you made yourself.

A PDF received from someone else may also retain an author, creation time, creation software, annotations, and embedded files. If you publish a PDF received for reporting or consultation as-is, you may create a path back to the provider.

Information in a received PDFRisk
AuthorThe provider or organization becomes visible
Creation timeThe time when the material was prepared becomes visible
AnnotationsInternal notes remain
Embedded filesSource materials get mixed in
FilenameCase names or personal names are visible

For received PDFs especially, check them from the perspective of protecting the provider.

Scanned PDFs Still Contain Clues

It is not safe just because you scanned paper and made it into a PDF.

A scanned PDF may retain the scanner name, creation software, and creation time. In addition, the page itself can retain seals, reception numbers, handwriting, folds, margin notes, and characteristics of the copier. Even an image PDF with no text information can produce clues from its appearance.

ClueWhat it revealsCaution
Scanner informationEquipment or environment usedCheck metadata
Reception numberInternal organizational materialThe numbering system can narrow candidates
HandwritingThe person who wrote itPeople who know them may recognize it
Seal or signaturePerson or organizationBecomes direct identifying information
Margins and foldsHow it was handledThe path of the original can be inferred

A scanned PDF has risks from both digital documents and paper documents.

Check not only metadata, but also the enlarged appearance of the page.

Summary

A PDF cannot be judged safe by appearance alone.

Author name, creation time, creation software, annotations, embedded files, and form information may remain.

Redaction is also insufficient if judged only by appearance. Handling that leaves the original text internally is dangerous.

For anonymity, check the PDF body text, appearance, internal information, filename, and sharing destination separately.

For high-risk materials, also consider using a trusted adviser or specialist instead of judging alone.

Related tools

Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site
PDF inspection

qpdf

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://qpdf.readthedocs.io/

Open external site

Related articles