IETF language tag

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

An IETF BCP 47 language tag is a short code used to identify a language (and sometimes its dialect, script, or region) on the internet. It’s created by the IETF and the related subtags are kept in the IANA Language Subtag Registry. Subtags pull from standards like ISO 639 for language codes, ISO 15924 for scripts, ISO 3166-1 and UN M.49 for region codes.

Common examples:
- en — English
- es-419 — Spanish of Latin America and the Caribbean
- rm-sursilv — Romansh in the Sursilvan variant
- sr-Cyrl — Serbian written in Cyrillic script
- nan-Hant-TW — Min Nan Chinese using traditional Han characters in Taiwan
- yue-Hant-HK — Cantonese using traditional Han characters in Hong Kong
- gsw-u-sd-chzh — Zürich German

Language tags are used in many computing standards (for example HTTP, HTML, XML, and PNG) to indicate what language content is in.

A quick history:
- RFC 1766 (1995) introduced language tags using ISO language codes and country codes.
- RFC 3066 (2001) added three-letter ISO codes and new matching ideas.
- RFC 4646 (2006) and RFC 4647 improved the structure and matching rules, added script and region codes, and moved to a new registry system.
- RFC 5646 (2009) added three-letter ISO codes (639-3 and 639-5) and extended language subtags; it also standardized how “extended language” subtags work and began phasing in newer guidelines.

How a tag is built:
- A tag is one or more subtags separated by hyphens.
- Subtags use only ASCII letters and digits.
- Subtags are not case-sensitive, but the recommended casing is:
- region: UPPERCASE (e.g., US)
- script: Title Case (e.g., Latn)
- other subtags: lowercase (e.g., en, es)
- The typical order is: language, optional extlang, optional script, optional region, optional variant, optional extensions, optional private-use (x-).

Notes on usage:
- It’s often better to omit a script or region if it doesn’t add useful information (for example, es instead of es-Latn, ja instead of ja-JP).
- If a language has a widely understood regional form, you can use a region subtag (en-GB vs en-US). For some cases, a more specific language subtag is preferred over a language-region pair (ar-DZ might be better as arq for Algerian Arabic).
- Some subtags describe dialects, extlang forms, or sign languages. The registry shows what is currently valid.
- Some script subtags indicate scripts like Hans (simplified Chinese) or Hant (traditional Chinese). In many cases the region or script can be omitted if they don’t change meaning.
- There are special “extension” subtags that add extra information. They start with a single letter (a singleton) and are followed by more data. Common extensions include:
- Extension T for transliteration or transcription details (e.g., en-t-jp for English translated from Japanese)
- Extension U for locale data like calendar, time zone, and currency (e.g., en-Latn-US-u-ca-gregorian)
- Private-use subtags begin with x- and are not registered; they are for private agreements between users.

Where to look up subtags:
- The Language Subtag Registry lists all currently valid public subtags. Private-use subtags are not in this registry.

Why this matters:
- Language tags help software choose the right language resources and format data correctly for users around the world. They support localization, translation, and proper display of text and data across languages and regions.

In short, IETF language tags provide a flexible, standardized way to name languages and their variants, so software can handle content in the right language and format for users everywhere.

This page was last edited on 2 February 2026, at 09:00 (CET).