Learn

284 articlesCategory: All
Metadata

Hidden text and XML comments

For files you publish, checking only the information visible on screen is not enough.

SVG, HTML, XML, and related file types may retain text or comments that are not displayed.

Even if the appearance is blank, the file may still contain work notes, text from before deletion, names, company names, internal URLs, or project names.

For anonymity, you need to check not only visible information, but also information that can be read if someone tries to read it.

This article explains how hidden text and XML comments become risks, and what to check before publication.

What is hidden text?

Hidden text is character information that remains inside a file in a form that is invisible or hard to notice on screen.

Examples include transparent text, text moved off screen, hidden layers, extremely small text, and elements hidden with CSS.

TypeAppearanceContent
Transparent textNot visibleRemains as a string
Off-screen textOutside the display areaRemains with coordinates
Hidden elementNot displayedRemains as a tag
Text that is too smallHard to readFound by zooming or searching
Draft before deletionMay not be visibleRemains as a comment or separate element

Hidden text can remain even without malicious intent.

Notes placed during work, names you thought you deleted, old titles, or company names inside templates can remain in published files.

What are XML comments?

XML comments are notes that can be written inside a file.

Normally, they are not visible in the rendered browser view. However, they can be read if the file is opened.

They are written in a form such as <!-- comment --> and are used in SVG, HTML, XML, and similar formats.

What remains in commentsExampleCaution for anonymity
Work noteRemove XX company name laterOrganization name remains
Person in chargechecked by TanakaPersonal name appears
Internal URLstaging.exampleDevelopment environment is visible
Project nameproject-alphaPeople involved can recognize it
Text planned for deletionOriginal descriptionInformation thought to be deleted remains

Comments are useful in development, design, and content creation work.

However, if they remain in published files, third parties can read information that is not displayed.

Why this relates to anonymity

Hidden text and comments may directly indicate a person or affiliation.

In addition, even weak information can be combined with other information. When it overlaps with filenames, the poster, publication times, image content, URLs, or past posts, candidates are narrowed down.

Remaining informationWhat it connects withWhat happens
Company namePost content, occupationAffiliation is inferred
Creator nameAccount, past filesBrings the file closer to a real name
Internal URLOrganization or projectPeople involved are narrowed down
Draft textOriginal personal storyInformation thought to be deleted is restored
DatePosting time, eventCompared on a timeline

Invisible information is information the poster is likely to overlook.

That is why it should be included in pre-publication checks.

How to check

The basic check is to open the file as text.

SVG, HTML, and XML can be opened in a text editor. Search for names, company names, school names, email addresses, URLs, comment markers, old titles, and internal terms.

Check methodWhat to look atCaution
Open in a text editorComments, tags, stringsDo not judge by display alone
Search inside the fileNames, organization names, URLsSearch multiple variants
View source in the browserHTML or SVG structureDifferent from the rendered result
Zoom inSmall text, screen edgesAlso check visual leaks

Searching only for your own name is not enough.

Also check company names, department names, school names, project names, old handles, email addresses, place names, and internal terms.

Manually check even after removing with tools

Optimization tools and removal tools may reduce unnecessary comments and metadata.

If you put files into optimization services or removal services on the web, the file content and access information may be passed outside. For publication that requires anonymity, first check locally.

However, you cannot leave everything to tools.

Tool behavior changes depending on settings. Tools do not fully decide, from an anonymity perspective, which elements should remain and which should be removed.

What tools can reduceWhat humans checkReason
CommentsText inside imagesVisible information remains
Unnecessary attributesProper nounsMeaning is not judged
MetadataContextSome information is understandable only to people involved
Empty elementsFilenameOutside information remains separately

For publication that requires anonymity, open the file again and check it after using a tool.

Context remains even after comments are removed

Even if XML comments are removed, contextual clues may remain.

id names, class names, filenames, folder names, URLs, text inside images, and body text about personal experiences are in places separate from comments.

What was removedWhat remainsWhat to check
XML commentsid and classWhether internal terms remain
Work notesFilenameWhether project names or dates remain
Draft textDisplayed textWhether it is visible inside images
Internal URLReferenced URLCheck external links and environment names

Removing comments is important, but do not stop there.

For anonymity, check which information has moved where and remains there.

Be especially careful with collaborative files

In SVG or HTML made by multiple people, comments and hidden text are more likely to remain.

That is because names of people in charge, review comments, revision requests, internal terminology, and client names may be inserted during the work process.

Information remaining from collaborationExampleCaution
Person in chargereviewed by APersonal name remains
Revision noteThis is for XX companyClient or organization is visible
Internal abbreviationfirst-team-onlyInsiders can understand it
Old proposalOld title, old logoInformation thought to be deleted remains

When publishing a collaborative file, check the whole file, not only the parts you wrote.

What matters is not "I did not put it in," but "whether it remains in the published file."

If the publication target is a zip file or folder, check every file included inside it.

Even if one SVG is clean, information leaks if comments remain in an included old HTML file or README.

Summary

Hidden text and XML comments are information that remains inside a file even if it is not visible on screen.

SVG, HTML, XML, and related file types may retain work notes, names, company names, internal URLs, drafts, and project names.

Before publication, check not only the appearance, but also the contents as text.

Even if you optimize with a tool, do not judge that this alone makes it safe.

For anonymity, it is important to check both displayed information and information that remains readable inside the file.

Related tools

Metadata inspection

ExifTool

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://exiftool.org/

Open external site
Metadata removal

MAT2

An external resource related to this article. Open it only when it fits your situation and threat model.

Why it is listed: It can help with the article topic, but it is outside Anonymity Sense and should be checked before use.

URL : https://0xacab.org/jvoisin/mat2

Open external site

Related articles