Author Information in Documents, PDFs, and Office Files
When handling documents for whistleblowing, do not look only at the body text.
PDFs, Word, Excel, PowerPoint, and imaged materials may retain author information, edit history, comments, templates, file paths, and distribution identifiers.
Even if you remove a name from the body text, anonymity breaks if the author's name remains inside the file.
This article organizes author information in documents, PDFs, and Office files that requires particular caution in whistleblowing.
What Is Author Information?
Author information is information attached to a file, such as the author, company name, editor, creation software, and creation date/time.
Office documents and PDFs may retain information that is not visible from the document's appearance.
Information
What it shows
Author name
Real name or internal account name
Company name
Affiliated organization
Last modified by
Person involved in editing
Creation date/time
When the material was created
File path
Department name, user name, device name
Comments
Internal conversations or responsible people's names
These may remain as properties or metadata even if they do not appear on the screen when the file is opened.
Cautions for Office Files
Word, Excel, and PowerPoint easily retain change history and comments.
Especially in internal documents, reviewers, department names, revision history, hidden sheets, and embedded objects become clues.
Office file information
Risk
Change history
Shows who edited which part
Comments
Responsible people's names and internal conversations remain
Hidden sheets
Data not visible on the surface remains
Embedded files
Original materials or separate files are included
Template
Organization name or department name is visible
Sending Office files outside as-is is dangerous.
Create a publication copy and check for unnecessary history and comments. However, in situations where evidentiary value is needed, preserve the original before processing.
Cautions for PDFs
PDFs look safe, but information remains inside them.
They may include author, creation software, annotations, forms, embedded text, text under redaction, and attached files.
PDF information
Caution
Author
Real name or organization name remains
Annotations
Comments and review history are visible
Embedded text
Text under redaction may remain
Forms
Input history or field names remain
Attached files
Separate files may be included inside the PDF
Simply converting to PDF does not make a file safe.
Check redaction, annotations, properties, and embedded information.
Be Careful With Redaction Failures
In whistleblowing materials, names or numbers may be redacted.
However, if you only place a black rectangle on top, the text underneath may remain. It may be possible to copy text from the PDF, search for it, or see it when layers are removed.
Redaction failure
What happens
Only placing a shape over text
Original text appears through copying or search
Only hiding with an annotation
It becomes visible when annotations are hidden
Insufficient imaging
Text can be read at high resolution
Removing only part of it
It is inferred from surrounding information
Including the original file
Unprocessed materials remain
Do not judge redaction by appearance alone.
Open the publication file in another app, and check copying, search, and properties.
Limits of Screenshots
Some people think making a screenshot of a document makes it safe.
Screenshots may reduce some metadata, but they are not a universal solution. Notifications, account names, times, OS UI, browser tabs, and filenames may appear on the screen. Image files may also retain creation time or device information.
What remains in screenshots
Caution
Notifications
Real-name accounts or other services appear
Browser tabs
Pages or services being viewed are visible
Time
Connects with behavior history
Filename
Case names or department names are visible
Image metadata
Creation time or device information may remain
Converting to a screenshot is not a process that removes all problems from the original.
It requires separate checking as a publication image.
Separate Checking From Preservation
In whistleblowing, thinking only about removing metadata is dangerous.
The evidentiary value of the original may matter. Information about when a material was created, who created it, and its authenticity may be needed.
File
Handling
Original
Store safely to preserve evidentiary value
Checking copy
Inspect metadata and content
Submission copy
Match the recipient-side procedure
Publication copy
Remove unnecessary personal information
What to remove and what to keep changes depending on the purpose.
Providing information to a news organization, consulting a lawyer, and public posting require different judgments.
Do Not Judge by Tools Alone
Metadata checking tools are useful.
ExifTool is a representative tool used to check metadata in many file formats.
However, information visible to tools is not the only risk. Internal terminology in the body text, document numbers, distribution scope, phrasing habits, and the small number of people who could obtain the file also narrow candidates.
Follow the Recipient's Procedure
It is also important not to over-process whistleblowing materials by your own judgment.
News organizations, lawyers, support groups, and reporting desks may each have procedures for receiving materials and policies for evidence preservation. Before submission, check what format to send, whether the original is needed, and whether an anonymized copy is acceptable.
Recipient
What to check
News organization
Anonymous submission channel and material format
Lawyer
Original preservation, evidentiary value, whether processing is allowed
Support group
Safe consultation route, necessary materials
Internal reporting desk
Identity verification, logs, retaliation risk
Administrative agency
Submission format, procedure, handling of identity information
Safe document handling changes depending on the purpose and destination.
Do not decide by yourself that "deleting makes it safe." Check both the necessary evidentiary value and anonymity.
If judgment is difficult, confirm a safe way to consult the recipient before submission.
Document handling is work that deals with anonymity and evidentiary value at the same time.
If only one side is prioritized, risks that cannot be explained later remain.
Summary
Documents, PDFs, and Office files may retain author information, edit history, comments, file paths, annotations, and hidden data.
Even if you remove a name from the body text, anonymity breaks if a real name or organization name remains inside the file.
In whistleblowing, separate originals, checking copies, submission copies, and publication copies. For materials where evidentiary value is needed, do not process them carelessly; check with a specialist or recipient.
Metadata checking tools are useful, but content correlation and distribution-scope correlation also need to be checked at the same time.
Related tools
Metadata inspection
ExifTool
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.