Learn

284 articlesCategory: All
Whistleblowers

Metadata Risks in Office and PDF Files

Office files and PDFs are materials often handled in whistleblowing.

Meeting minutes, contracts, reports, email attachments, spreadsheets, presentation materials, scanned PDFs. All can be strong evidence, while also being formats where metadata and edit history tend to remain.

In whistleblowing, the issue is not only the file contents, but also how the file was created, who edited it, and what environment it came from.

Information left in Office files

Word, Excel, and PowerPoint files can retain authors, last saved by, company names, comments, tracked changes, hidden sheets, embedded objects, and similar information.

InformationRisk
Author / last saved byReal names or internal accounts appear
CommentsNames of people involved and review contents remain
Tracked changesIt becomes clear who edited what
Hidden sheetsData that is not displayed remains
Embedded filesSeparate materials or internal information are included

In an Office file, the visible page is not the whole content.

Especially in Excel, be careful with hidden sheets, filters, comments, formulas, and external links.

Office files are formats that easily carry work-in-progress information.

Collaborative editing, review, comments, tracked changes, templates, and external links remain.

Even if a document looks finished on the surface, traces of editing may remain inside the file.

In Excel, hidden sheets, hidden rows, rows hidden by filters, formulas, defined names, and external data connections become problems.

In PowerPoint, pay attention to speaker notes, hidden slides, embedded images, and templates.

In Word, check tracked changes, comments, headers, footers, and document properties.

Information left in PDFs

PDFs are often assumed to be safe because they look like final versions.

However, PDFs can also retain author, creation software, creation date and time, edit history, annotations, bookmarks, embedded files, and OCR text.

InformationRisk
AuthorOriginal document or worker becomes visible
Creation softwareThe environment where it was created can be inferred
Annotations / commentsReview history and names remain
OCR textText you thought you redacted may remain
Embedded filesOriginal materials or attachment information are included

Simply converting to PDF does not make it safe.

Redaction and pixelation can also allow the original text to be extracted if the processing method is poor.

PDFs invite complacency because they look like finished distribution copies.

However, PDFs can also retain author information, annotations, bookmarks, attachments, hidden layers, and OCR text.

If redaction only places a black rectangle over the text, the underlying characters may remain.

Even if it appears as an image, text may remain behind it.

When preparing a PDF for publication, check not only its appearance, but also copyable text, annotations, attachments, and properties.

For redacted areas, check that they are not visible when copied, searched, selected, or opened with another tool.

Especially dangerous points in whistleblowing

In whistleblowing, metadata can be dangerous even if it does not directly show a name.

If creation time, version numbers, department names, document numbers, names in comments, or traces of distribution remain, the flow of the material becomes visible.

Information that remainsWhat can be inferred
Version numberWhen and to whom the material was distributed
Document numberManaging department or material classification
CommenterRelated department or review staff
Hidden dataInformation not meant to be published
Creation date and timeWhen someone touched the material

If the publishing side casually releases materials, not only the whistleblower but also people involved and unrelated employees may be drawn in.

In whistleblowing, there are people who look for the source of the material.

Those people look not only at the body text, but also at version numbers, distribution destinations, commenters, document numbers, templates, and creation times.

For example, if wording that exists only in the latest version is published, people who had access to the latest version may be suspected.

If commenter names remain, related departments and review paths become visible.

Metadata affects not only the whistleblower, but also document creators, departments that received the material, reviewers, and co-editors.

The publishing side has a responsibility not to publish received materials as-is.

Cautions for checking and processing

When handling Office files and PDFs, separate files for checking, storage, and publication.

If you carelessly process an original file that is needed as evidence, it may become a problem later. On the other hand, you must not leave unnecessary information in a publication file.

StageCaution
ReceivingDo not casually open the original file in your everyday environment
CheckingLook at properties, comments, tracked changes, and hidden elements
StorageSeparate the original file and the publication copy
ProcessingCheck the method for redaction, deletion, and conversion
RecheckingCheck whether information remains in the publication file

Specific tools for checking and removing metadata are covered in another article.

Here, understand that format conversion alone does not make a file safe.

The original file may be important as evidence.

Therefore, directly processing and overwriting the original file can affect evidentiary value and verifiability.

On the other hand, unnecessary information must not remain in the publication file.

For this reason, separate the original file, working copy, and publication copy.

File typeHow to handle it
Original fileStore safely to preserve evidentiary value
Working copyUse for checking and processing
Publication copyRemove unnecessary information and recheck
Consultation copyAdjust the scope shown to lawyers or specialists

For high-risk whistleblowing, do not judge from an article alone; also consider consulting a lawyer, news organization, or trusted support contact.

Whether a file should be deleted or preserved relates not only to anonymity, but also to evidentiary value and legal risk.

Pre-publication check

Before publishing Office files or PDFs, look in this order.

  • Does the filename contain a real name, department name, or project name?
  • Do properties still contain author, company name, or last saved by?
  • Do comments, tracked changes, or annotations remain?
  • Are there hidden sheets, hidden slides, or speaker notes?
  • Does text remain under PDF redactions?
  • Did you recheck the converted file in another environment?

Checking does not end after one pass.

Recheck after processing, after conversion, and immediately before publication.

Especially after converting to PDF, treat it as a separate file from the source document, and recheck properties, annotations, and copyable text.

Summary

Office files and PDFs can retain authors, last saved by, company names, comments, tracked changes, hidden sheets, annotations, OCR text, and embedded files.

In whistleblowing, this information connects to the whistleblower, departments, the flow of materials, and distribution scope.

Simply converting to PDF does not make it safe.

Separate the original file, checking copy, and publication copy, and check metadata and invisible elements before publication.

Office files and PDFs can be strong evidence, while also being file formats that describe where they came from.

Related tools

Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site
PDF inspection

qpdf

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://qpdf.readthedocs.io/

Open external site
Whistleblower submission

SecureDrop

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://securedrop.org/

Open external site
Whistleblower platform

GlobaLeaks

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://globaleaks.org/

Open external site

Related articles