Readablewiki

Polyglot (computing)

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

A polyglot in computing is a file or program that is valid in more than one programming language or file format. It works by mixing syntax from several formats so different programs can read the same file. To keep things valid for all interpreters, language-specific parts are often hidden in comments or written so that each language reads them differently.

Polyglot programs have limited practical use, but they have appeared as puzzles and curiosities since the 1990s. They can also pose security risks when they are used to bypass checks or to exploit a vulnerability. In the 2000s and later, polyglots gained attention as a way to secretly deliver malware or to hide payloads inside legitimate formats.

Two common ideas in polyglots are:
- A file that can be interpreted as two or more formats at the same time, each producing a different result. For example, a document could be both a valid PDF and a valid ZIP archive.
- Programs that are written so each language treats parts of the file differently, often by using different comment styles or by redefining tokens so the same text means different things in different languages.

There are many examples:
- A polyglot that can be read as ANSIC, PHP, and Bash.
- A file that runs as a Windows batch script, then re-runs itself as Perl.
- HTML5 and XHTML can be written as a single polyglot document that browsers parse as HTML or as XML, producing the same DOM structure.

Polyglot markup has been proposed to blend benefits of HTML5 and XHTML. An HTML5/XHTML polyglot must start with the right doctype and be well-formed so it can be parsed as either HTML or XML. In HTML, non-void elements (like script, p, div) cannot be self-closing, even if empty.

Some file formats naturally support polyglots better than others. For example:
- DICOM (medical imaging) can be interpreted as both DICOM and TIFF, letting the same data be viewed by different viewers.
- Python 2 and Python 3 share enough similarities that a single script can sometimes run on both, though compatibility isn’t guaranteed.
- PDF and other formats can be abused to hide extra content in ways that some readers accept.

Security examples show the risk:
- A polyglot can embed a hidden payload inside a widely accepted wrapper (like a JPEG’s comment field), potentially tricking a renderer into executing code.
- A PDF or other complex format might accept extra data in ways that allow attackers to run unintended commands.
- GIFAR is a combination of GIF and JAR that can hide Java code inside a GIF image, potentially delivering code to a browser or server.
- Malware detection can be hard for polyglots, because the file behaves differently in different interpreters, and some security tools may miss them.

In short, polyglots are clever mixes of formats that can work in multiple ways at once. They are interesting and can be useful in some niche contexts, but they also raise security concerns because they can bypass checks and deliver hidden payloads.


This page was last edited on 2 February 2026, at 03:56 (CET).