Learn

284 articlesCategory: All
Journalists

Checking Metadata in Files You Receive

Files received from sources must not be opened as-is or shared as-is.

Images, PDFs, Office documents, video, audio, and compressed files can retain information other than the body content. Author names, organization names, capture locations, capture dates and times, edit history, comments, and internal filenames can remain. This information becomes clues that point to a source.

Metadata checks are necessary not only before publication, but also before sharing inside a newsroom.

What Metadata Is

Metadata is information attached to a file, not the file content itself.

For documents, authors and edit history can be issues; for images, capture date and time and GPS; for video, device information and audio; and for compressed files, internal folder names.

FileInformation to check
ImageGPS, capture date and time, camera model, thumbnail
PDFAuthor, editing software, embedded information
Word/ExcelAuthor, organization name, change history, comments
VideoCapture date and time, device information, audio, background
AudioRecording environment, speakers, background sound, creation information
Compressed fileInternal filenames, folder structure, creation dates and times

If you judge a file only by how it looks, you miss this information.

Metadata is descriptive information attached to the file. It includes author, date and time, device, edit history, location information, software, and internal filenames. Even if a source removed their name from the body text, their real name or organization name may remain in metadata.

Metadata is also not the only issue. Image backgrounds, voices and place names in audio, internal terms in document body text, filenames, and folder structures also become clues. Metadata checking is part of wider file checking.

How It Connects to a Source

Metadata can directly reveal a source's name.

But that is not all. Even if only an organization name, device name, department name, shooting location, or capture time appears, it narrows the candidates.

Information that remainsConnection to the source
Author nameReal name or internal account appears
Organization nameWorkplace or department becomes known
Shooting locationShows where the material was photographed
Capture timeChecked against work records or entry/exit logs
Comment historyReveals co-editors or internal exchanges

With internal materials, the person suspected may be a viewer or editor rather than the author.

Think about which information connects to whom.

For example, GPS in an image indicates the shooting location. The "last saved by" field in an Office document indicates an internal account name. The PDF creation time overlaps with the time a material was viewed. A compressed file contains a folder with a department name. These indicate not only the source themselves, but also recipients of the material or related departments.

In source protection, do not judge something safe because "no name appears." Check whether any information remains that narrows the candidates.

Order of Checking

Check received files in an isolated checking environment.

If you put them directly into your everyday cloud storage or personal device, sync, previews, history, and backups occur.

OrderWork
1Record the receiving path and the sender's situation
2Make a copy for checking without carelessly opening the original file
3Check the filename, extension, and creation date and time
4Check metadata, comments, and change history
5Remove unnecessary information from the publication copy
6Have another person recheck before publication

The original file may need to be preserved for evidentiary value.

Even in that case, handle it separately from the publication file.

If you open a received file in your everyday environment, previews, sync, recent files, virus scanning, and cloud backup may run. As a result, the file from the source remains in another place. For high-risk materials, separate the checking environment and storage location.

It is also important not to immediately modify the original file. It may need to be preserved as evidence. Separate the original file, checking copy, and publication copy, and modify only the publication copy.

Do Not Rely Only on Tools

Tools are useful for metadata checks.

ExifTool is a representative tool that can check metadata for many file formats locally. The important point is that you can check locally without uploading received files to an online checking site you do not understand well. Detailed usage is covered in another article.

URL : https://exiftool.org/

However, tools alone do not make a file safe.

A source may be inferred from image backgrounds, audio content, proper nouns or specific terms in document body text, or the type of material itself.

Checking methodWhat is easy to miss
Metadata checking toolImage backgrounds or body text content
Visual reviewAuthor information inside the file
Audio reviewMetadata or recording device information
Automatic removalSome comments or change history

Combine tool checks with human review.

Tools such as ExifTool help check information that is not visible. However, a file is not safe just because a tool displays nothing. Information such as a station name visible in an image background, a name included in audio, or a department identifiable from document wording must be checked by a person.

Conversely, visual human review alone misses author information and embedded data inside the file. In source protection, combine tool checks with visual and audio-content review.

Check Before Sharing Inside the Newsroom

Metadata checks may be too late if done only immediately before publication. Once the original file is placed in a newsroom shared folder, chat, or cloud storage, viewers and sharing history spread. Do not share a file containing source information as-is with many people.

What to look at before sharingReason
Sharing scopeDo not expand beyond necessary people
FilenameWhether the source or organization name appears
MetadataCheck author, location information, and edit history
Storage locationAvoid real-name cloud storage and broad shared folders
Publication copyHandle it separately from the original file

Files received from sources are subject to source protection before they become published material.

For high-risk materials, checking and processing may not be something one person should decide alone. Evidentiary value, legal risk, and source protection are involved at the same time, so consult an editor in charge, lawyer, or trusted specialist when needed. However, even during consultation, do not carelessly spread the original file. Immediately after receiving it, first treat it as something to isolate. Before sending it into everyday cloud storage or chat, decide the checking environment, storage location, and sharing scope.

Summary

Files received from sources retain information other than body content.

Author, organization name, shooting location, capture date and time, edit history, comments, and internal filenames become clues that point to a source.

Check files not only before publication but also before sharing inside the newsroom.

Tools such as ExifTool are useful, but inference from backgrounds, body text, audio, and the type of material must be reviewed separately.

Metadata checking is a basic source-protection task.

Related tools

Reverse image search

Google Lens

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://lens.google/

Open external site
Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site
PDF inspection

qpdf

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://qpdf.readthedocs.io/

Open external site
Audio and video

FFmpeg

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://ffmpeg.org/

Open external site
Whistleblower submission

SecureDrop

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://securedrop.org/

Open external site
Whistleblower platform

GlobaLeaks

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://globaleaks.org/

Open external site

Related articles