Learn

284 articlesCategory: All
Metadata

Removing PDF metadata

Notes when removing PDF metadata

It is insufficient to think that a PDF becomes safe if you remove its metadata.

Metadata removal is important. However, PDFs contain more than metadata: body text, images, annotations, links, redaction, embedded files, filenames, and sharing settings. Even if only metadata is removed, information leading back to a person or organization may remain elsewhere.

This article organizes how to think about PDF metadata removal and what to check after removal.

Make a copy before removal

The first thing to do is not to touch the original file directly.

Create a publication copy, and perform removal or conversion on that copy. The original file may need to retain editing history or evidentiary value. In consultation, whistleblowing, or legal procedures in particular, think carefully about how to handle the original file.

TargetHandlingReason
Original fileStore in a safe placeFor evidentiary value and rechecking
Publication copyRemove metadata or convertTo prepare it for publication
Working folderAvoid personal names and case namesReduces leakage from file paths
Sharing destinationKeep to the minimum necessaryAvoids unnecessary spread
RecordKeep what was removedMakes later checking possible

Metadata removal is one part of the work.

Think about it together with original file management, the publication copy, and checking the sharing destination.

Information to remove

There are several kinds of information that should be removed or checked in a PDF.

InformationRiskHow to check
AuthorPersonal name or OS username appearsView document properties
Company nameAffiliated organization is visibleCheck properties and original Office file
Creation timeConnects to activity timeView properties
AnnotationsComments and internal conversations appearView annotation list
Embedded filesSource materials are mixed inCheck attachments and structure
FormsInput information remainsCheck form fields

If the PDF was created from an Office document, also pay attention to the original Office side.

Even if you look only at the PDF, author information or change history may remain in the original file.

Recheck after removal

After using a removal tool, always recheck.

Even if you intended to remove metadata, information may remain in annotations, links, images, body text, or redacted areas.

What to checkReason
Document propertiesSee whether author and title are gone
Text searchSee whether hidden words appear in search
Copy operationSee whether redacted text can be copied
Annotation listSee whether comments remain
LinksSee whether personal IDs or internal URLs remain
FilenameSee whether a real name or organization name is included

It is important not to feel safe just because removal was performed.

Open the file to be published again in a separate environment and check it.

Tools are helpers

Tools such as qpdf, ExifTool, and mat2 may be used for PDF checking and conversion.

ExifTool is a representative tool used to check metadata across multiple formats.

URL : https://exiftool.org/

However, tools are not all-powerful.

Tools help check and remove internal information, but information written in body text, backgrounds in images, notifications in screenshots, and owner names on sharing links must be checked separately.

What tools can doWhat tools alone cannot cover
Check metadataProper nouns in text
Remove some metadataSafety of redaction
Check file structureBackgrounds and reflections in images
Format conversionLogs and owner display at the sharing destination
AutomationHigh-risk legal judgment

For high-risk documents, do not make the publication decision based only on tool output.

Some information should not be removed

If you look only at anonymity, you may want to remove all information.

However, in victim consultation, whistleblowing, and legal consultation, some information should be preserved as evidence. The original file's creation time, editing history, and transmission route may become important later.

InformationIn publication copyIn consultation/evidence
Creation timeConsider blurring or removingMay become a necessary record
AuthorRemove before publicationMay be retained in the original
Change historyRemove from publication copyMay be needed to explain the sequence
AnnotationsRemove from publication copyMay record internal confirmation
Original fileDo not publishStore in a safe place

This is why the publication file and the stored original are separated.

Deletion necessary for protecting anonymity and preservation necessary for protecting evidence must be considered at the same time.

Name and storage location after removal

Even if metadata is removed, filenames and storage locations may reveal information.

For example, a filename may contain a real name, company name, or case name. A cloud sharing link may display the owner name of a real-name account. A working folder name may appear in a screenshot. Check this information separately from metadata removal.

What to checkReason
FilenameAvoid real names, organization names, and case names
Folder nameMay appear during screen sharing or compression
Cloud ownerConnects to a real-name account
Sharing URLMay contain personal IDs or tracking
Sending timeCompared with activity records

At the end of removal work, check the state from a third party's point of view.

Even if it looks safe on your own device, the recipient may see the owner name, preview, link card, or filename. If possible, check the final public state in another browser or another environment.

When checking after removal, it is important not to look only at metadata.

Also check proper nouns in the body text, image backgrounds, linked destinations, filenames, and the display name of the upload destination. Removal work must be paired with checking the final publication as a whole.

Record the removal work

For high-risk documents, recording what was removed and what was left can make later checking easier.

However, the record itself also needs caution. If it is stored in a real-name cloud account or on a workplace device, it becomes a clue connecting you with the material being removed.

What to recordNotes
Removed itemsDo not include in the publication file
Items leftBe able to explain why they were left
Storage location of original fileLimit to a safe place
Publication filenameDo not include personal information
Time checkedBe careful about correlation with activity records

Removal work is part of the publication decision.

Verify after removal with another method

Metadata removal does not end just because it was executed.

Check with a method different from the tool used for removal. View document properties, select and copy text, open in another PDF viewer, and try downloading in another environment. Using several perspectives makes it easier to notice missed removals or display differences.

Verification methodProblems it can find
View document propertiesAuthor, title, creation software
Select and copy textText under redaction, hidden text
Open in another viewerRemaining annotations or forms
Check in another environmentOwner names and sharing display
Reread body textProper nouns and contextual leaks

In removal work, prioritize the final state that will be published over the tool's success message.

In anonymity, "I thought I removed it" is the most dangerous state. After removing, check how it appears in the recipient's environment.

Summary

PDF metadata removal is important, but it does not make the file safe by itself.

Do not publish the original file directly. Create a publication copy and check author, company name, creation time, annotations, embeddings, forms, and links.

After removal, recheck document properties, search, copy operations, annotations, and filename.

Tools are helpers.

Body text, images, redaction, sharing destinations, and legal risks must be checked separately.

Related tools

Archive check

Wayback Machine

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://web.archive.org/

Open external site
Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site
PDF inspection

qpdf

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://qpdf.readthedocs.io/

Open external site

Related articles