Author and organization information left in document files
Document files retain information other than the body text.
Word, Excel, PowerPoint, PDF, internal templates, and exported materials. These may include author names, organization names, department names, device names, template names, revision history, comments, and parts of internal paths.
In whistleblowing, this information connects to the whistleblower or their organization.
Before sending a document, you need to check not only how it looks, but also the information inside the file.
What is author information?
Document creation software may record the author or company name in the file.
This includes the user name, organization name, last saved by, editors, template, creation date and time, and update date and time.
Information
Risk
Author name
A real name or internal account appears
Company or organization name
The affiliated organization becomes clear
Last saved by
Shows who last touched the file
Creation and update date/time
Can be compared with activity times or work records
Template name
Shows the department or type of internal document
Even if you remove a name from the body text, the name may remain in the file properties.
Author information may not be visible on the screen where the document is opened. For that reason, even if the body text appears problem-free, a real name, company name, or device name may remain in the file details or properties. This is information that document creation software saves automatically for collaboration or management.
In whistleblowing or anonymous submission, this automatic saving becomes a risk. Simply "removing the name from the body text" does not mean you have checked the whole document.
Where organization information remains
Organization information does not remain only in properties.
It may remain in document headers, footers, comments, revision history, internal links, embedded images, internal templates, and information added during PDF conversion.
Where it remains
Example
File properties
Author, company name, last saved by
Comments
Reviewer names, conversation history
Revision history
Who edited which parts
Headers and footers
Department name, document number, confidentiality classification
Internal links
Internal server names, folder names
Embedded images
Original image metadata or filenames
Internal materials are more likely to retain information in their internal structure than in the visible appearance of the document.
Internal templates require particular care. The header, footer, document number, approval fields, confidentiality classification, department name, internal links, fonts, and logo can reveal the organization or department. Even if you blur the body text, the source may still be visible from the template structure.
Information also remains in embedded images and graphs. The filename of a pasted image, original data, link destination, or creation software information may remain. Check document files by separating the body text, accompanying information, and embedded elements.
Why this is dangerous in whistleblowing
In whistleblowing, author information does not always point directly to the whistleblower.
Even so, organization names, department names, version numbers, templates, and comments reveal where the material came from and who may have had access to it.
Information left behind
What may be inferred
Department name
Which department the material belongs to
Version number
When the material is from and who received it
Commenter
People involved or the approval route
Internal path
Storage location or project name
Document number
Managing department or material classification
If the receiving side does not check before publication, not only the whistleblower but also other people involved may be affected.
In whistleblowing, the other side does not look only at "who created it." Who could view the material? Which department's template is it? Which period is this version from? Who commented on it? Which folder was it stored in? From this information, they narrow down the range of people who could access it.
Information in document files affects not only the whistleblower, but also the material creator, reviewers, recipients, and approvers. Before publication, check whether any information leads back to the people involved.
How to think about checking
Check document files from multiple angles.
Look at properties, comments, revision history, headers and footers, embedded files, and information after PDF conversion.
Whether information from the original document remains
Changing the document format does not necessarily remove information.
Even if you convert Word to PDF, author information or information derived from the original file may remain.
Conversion may reduce information, but it does not always remove it. Even conversions such as Word to PDF, Excel to image, or PowerPoint to PDF may leave author information, embedded fonts, comments, hidden sheets, or filenames. Check the converted file as a separate file too.
Responsibility of the receiving side
The risk in document files is not only a problem for the sender. News organizations, support groups, consultation desks, co-editors, and other recipients also need to check. If you share or publish a received file as-is, you spread information about the sender and people involved.
What the receiving side checks
Reason
Properties
Whether author or organization names remain
Comments and revision history
Whether names of people involved or past text are visible
Body text and headers
Whether there are document numbers, departments, or distribution ranges
Embedded elements
Whether original information from images or tables remains
Publication version
Whether information remains after conversion
To handle documents safely, check them three times: before sending, after receiving, and before publication. At every stage, it is important not to judge by appearance alone.
Be careful with information you thought you removed
In documents, information removed from the screen may remain somewhere else. Even if you think you deleted comments, they may remain in revision history. Even if you remove a name from the body text, it may remain in properties. Even if you paste an image again, the original filename may remain. Even if you convert to PDF, author information may remain.
For that reason, check the revised document again. Keep the pre-edit and post-edit files separate, and check the final file that will actually be distributed rather than only checking the publication version.
Document files may retain author names, organization names, last-saved-by information, revision history, comments, templates, internal links, and similar information.
In whistleblowing, these become clues to the whistleblower, department, source of the material, and distribution range.
Even if you remove names from the body text, information may remain inside the file.
Before sending a document, before sharing received materials, and before publication, check properties, revision history, comments, headers and footers, and embedded elements.
It is important not to judge document files by appearance alone.
Related tools
Metadata inspection
ExifTool
An external resource related to this article. Open it only when it fits your situation and threat model.
Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.