Learn

284 articlesCategory: All
Metadata

What Is Metadata?

Files do not contain only the body text or image that appears on the surface.

For a photo, the capture time or camera model. For a document, the author name or editing software. For a PDF, the creation time or document title. For a video, the capture device or location information.

This kind of "data about data" is called metadata.

Metadata is useful information. It is used for file management, search, editing, and organization.

However, from the perspective of anonymity, metadata becomes a clue that is easy to overlook.

Hidden Information Left in the File

Metadata is not the file content itself, but information about that file.

For example, photo metadata may include capture time, camera model, lens information, and GPS location information. Document files may retain the author name, company name, edit history, comments, and software used.

FileInformation that may remainAnonymity point to watch
PhotoCapture time, location information, camera modelReveals a place or device
VideoCapture time, location information, device informationConnects to a filming location or routine places
PDFAuthor, title, creation softwareA personal name or organization name may remain
Office documentAuthor, company name, edit history, commentsInternal information or worker names may remain
AudioRecording time, software information, tagsBecomes a clue to the recording environment or edit history

Metadata does not always stand out when a file is opened. That is why it is dangerous.

Even if you carefully blur the visible text, if an author name remains inside the file, anonymity weakens quickly.

Why Metadata Remains

Metadata is not inherently bad.

Photo apps organize photos by capture time and place. Document creation software manages authors and edit history. PDF creation tools record titles and creation software.

In other words, metadata exists to make work more convenient.

The problem is that this convenient information also remains when publishing or sharing.

Information that is not a problem when used only inside your own device becomes a clue to identity or organization when passed externally.

Metadata That Matters for Anonymity

The metadata that requires particular care for anonymity is information connected to a person, place, time, device, or organization.

TypeExampleWhat it connects to
Personal informationAuthor name, username, company nameReal name or affiliation
Time informationCreation time, capture time, update timeActivity time, work hours, movement history
Location informationGPS, capture locationRoutine places, workplace, school, places visited
Device informationCamera model, smartphone model, software nameDevice or environment used
Editing informationComments, change history, layersInternal work, people involved, information before revision

Each of these can look small by itself. However, when combined with post content, image backgrounds, accounts, and publication time, it becomes material that can support identity inference.

Removing Metadata Is Not the End

Removing metadata is important. However, that alone does not make things safe.

For example, even if you remove GPS information from a photo, the location can still be inferred if a station name, shop name, school name, internal notice, uniform, or car license plate appears in the background.

Even if you remove author information from a PDF, the candidate set can narrow if a department name, internal term, or background known only to a small number of people remains in the body text.

What to checkWhere to lookExample
MetadataInternal file informationAuthor, time, location information
Visible contentThe image or body text itselfBackground, signs, writing style, internal information
File nameName used when sharingReal name, case name, organization name
Sharing pathWhere it was sent fromCloud history, email, account

For anonymity, check metadata and content separately. Looking at only one side is not enough.

Order for Checking Before Publication

For metadata checks, deciding the order reduces oversights.

  1. Look at the file name
  2. Look at the visible content of the file
  3. Check the metadata
  4. Remove unnecessary metadata
  5. Check the file again after removal
  6. Check the sharing method and upload destination

What is especially important is checking again after removal.

Some software adds new metadata when saving or converting. Information may also change after editing an image, exporting a PDF, or uploading to the cloud.

It is also important not to confuse the checked file with the file you will publish.

If you check the original file and then save it under another name, export a new file after editing an image, convert it to PDF, or package it into an archive, check the final file for publication again.

OperationReason to check again
Save asNew author information may be added
PDF conversionCreation software or a title may be added
Image editingEditing software or save time may be added
CompressionInternal file names or folder names remain
Cloud uploadOwner information or history may be added

Mistakes to Avoid When Handling Metadata

When handling metadata, avoid the following mistakes.

  • Assuming it is safe after only changing the file name
  • Removing only image GPS and not looking at the background
  • Looking only at PDF properties and not checking comments or edit history
  • Not checking the converted file again
  • Not checking the history or account of a cloud sharing link
  • Taking a file created in a real-name environment directly into an anonymous environment

Metadata is not something you finish by deleting only one item. Check the file format, creation software, editing method, and sharing method as well.

Do Not Leave Everything to Tools

Metadata removal tools are useful. However, it is dangerous to think that a file is safe just because it passed through a tool.

A format may not be supported by the tool. Only some metadata may remain. Information may remain in the visible body text or background. Another application may add new information after removal.

For metadata measures, treat removal, rechecking, and content checking as a set. Detailed check methods and how to use ExifTool are covered in another article.

The Weight Changes Depending on the Recipient

Metadata risk changes depending on the recipient.

The strength of checking needed differs between a photo sent to family and materials sent as an anonymous tip or submission. It also differs between an image posted to a public social network and an image whose original file is shared through the cloud.

In high-risk situations, also consider extracting only the necessary information rather than handing over the original file as-is. Decide the strength of metadata checks according to the purpose and recipient.

Summary

Metadata is not the file content itself, but information about that file.

Photos, videos, PDFs, Office documents, and audio files may retain authors, times, location information, device information, edit history, and similar data.

Metadata is useful information, but for anonymity it becomes a clue connected to identity, place, time, organization, and device.

However, deleting metadata does not make everything safe. You also need to check image backgrounds, body text content, file names, sharing paths, and cloud history separately.

For anonymity, it is important to look at both "information left behind the file" and "information left on the surface."

Related tools

Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site

Related articles