Learn

284 articlesCategory: All
Metadata

Metadata in PDF, Office, video, and audio files

Image information is well known. However, metadata that matters for anonymity is not limited to images.

PDFs, Office documents, videos, audio files, and compressed files may also retain author names, edit histories, software used, capture dates and times, tag information, comments, internal filenames, and similar information.

Even if you think you removed something visually, other information may remain inside the file. Also, even if you remove metadata, identity or affiliation may still be inferred from the text, background, audio, text visible on screen, or filename.

This article organizes information that tends to remain by file format and what to check before publication.

Metadata is information around a file

Metadata is information attached to the file itself.

Separate from document text or the image itself, it may include author, creation date and time, editing software, location information, change history, comments, and similar information.

In anonymity work, metadata becomes a strong clue. That is because even when a name is not written in the text, an author name or organization name may remain inside the file.

FormatInformation that tends to remainCaution
PDFAuthor, creation software, annotations, embedded filesPay attention to handling of redaction and annotations
Office documentsAuthor, company name, change history, commentsHidden sheets and review information may remain
VideoCapture date and time, device information, location information, editing softwareAudio and backgrounds are also clues
AudioID3 tags, recording date and time, app informationCheck voices and background sounds too
Compressed filesInternal filenames, folder names, unneeded filesIn-progress work data may be mixed in

Metadata appears differently depending on the file format. For that reason, it is better not to decide everything is safe from one checking method.

Information to watch in PDFs

PDF is a format that makes documents easy to distribute. However, caution is needed when publishing materials anonymously.

PDFs may retain author names, creation dates and times, creation software, titles, annotations, embedded files, form information, and similar data.

Also, even if something appears to be redacted visually, the original text information may remain inside. If you only place a black rectangle as an image, the original text may be extractable by copying or searching.

What to checkReason
Document propertiesAuthor name and creation software may remain
AnnotationsComments and editing notes may be visible
Embedded filesOriginal materials or unneeded data may be included
Redaction processingCheck whether original text remains internally
FilenameCase names, organization names, or real names may be included

PDFs look like publication-ready files, but their internal structure is not necessarily safe. For high-risk materials, check with local tools and regenerate in a safer form if needed.

Information to watch in Office documents

Office documents such as Word, Excel, and PowerPoint are formats where work history tends to remain.

They may include author names, company names, change history, comments, hidden sheets, templates, links, collaborative editing history, and similar information.

For whistleblowing or reporting materials, handing over Office documents as-is requires particular care. That is because not only the document content, but also who created it and what organizational environment edited it may remain.

What to checkVisible information
AuthorPersonal name or account name
Company nameAffiliated organization or device settings
Change historyWho edited which parts
CommentsInternal conversation or traces of judgment
Hidden sheetsData that is not displayed
LinksInternal paths or cloud URLs

For Office documents, it is important not to judge only by the visible page. When preparing a publication version, delete unnecessary history, export to another format, and then check again.

Information to watch in video

With video, you need to pay attention to both metadata and content.

Inside the file, the date and time taken, device information, location information, editing software, and encoding information may remain. In addition, the video itself may show faces, backgrounds, signs, uniforms, vehicles, window reflections, notifications, and account names on screen.

Because video contains more information than still images, checking anonymity is also harder.

What to checkReason
Capture date and timeConnects to behavior time or on-site participation
Location informationReveals the shooting location
Device informationShows the device model or editing environment
BackgroundBuildings, signs, and scenery reveal location
AudioVoices, ambient sound, and conversations become clues
NotificationsAccount names or contacts may be visible

With video, checking the content remains necessary even if metadata is removed. Even in a short video, a few seconds of background or sound can reveal a place or person.

Information to watch in audio

Audio files also have metadata.

MP3 and similar files may contain tag information such as title, artist, album, creation software, and comments. Information about the recording app or editing software may also remain.

Also, with audio, the content itself is a strong clue. People or places may be inferred from voice, speaking style, dialect, background sound, notification sounds, station announcements, car sounds, room echo, and similar details.

What to checkReason
ID3 tagsTitle or author information may remain
Recording date and timeCan be compared with behavior time
Creation appBecomes a clue to the usage environment
VoiceReveals you or people involved
Background soundReveals place or situation

When publishing audio anonymously, metadata removal is not enough. Check voice quality, speaking style, and background sounds too.

Information to watch in compressed files

Compressed files such as ZIP files are used when handing over multiple files together.

With compressed files, pay attention to internal filenames, folder names, unneeded files, and in-progress work data.

For example, even if you intended to include only the PDF you want to publish, the same folder may also include the original Office document, notes, screenshots, or hidden files. Folder names may also include a real name, case name, or company name.

Always check the contents of a compressed file when you open it.

Basic checking flow

The checking flow is common regardless of file format.

StageWhat to check
1Do not publish the original file directly
2Make a publication copy
3Check the filename
4Check metadata
5Check text, images, audio, and backgrounds
6Recheck after deletion or conversion
7Check how the receiving service will display it to the recipient

Tools such as ExifTool are used to check metadata across multiple formats.

ExifTool is a widely used local tool that can check and edit metadata in multiple formats, including images, videos, and documents. When learning differences between file formats, it also becomes an entry point for checking what information is actually inside. URL : https://exiftool.org/

However, do not stop at tool results alone. Tools help check internal information, but you still need to separately check image backgrounds, document text, audio content, and owner displays at upload destinations.

Summary

Metadata is not only an image problem.

PDFs, Office documents, videos, audio files, and compressed files may also retain authors, edit histories, creation software, tags, location information, internal filenames, and similar information.

In anonymity work, check file content and metadata separately. Even if you remove metadata, identity or affiliation may be inferred from text, backgrounds, audio, reflections, filenames, and sharing settings.

Before publication, do not send the original file directly. Make a publication copy, check by format, and recheck after removal. You need to think of handing over a file as possibly handing over not only the text, but also the creation environment and sharing route.

Related tools

Archive check

Wayback Machine

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://web.archive.org/

Open external site
Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site
PDF inspection

qpdf

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://qpdf.readthedocs.io/

Open external site
Audio and video

FFmpeg

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://ffmpeg.org/

Open external site

Related articles