Wide character
A wide character is a data type used to store a single character in memory, and it uses more than the traditional 8 bits. This extra size lets computers represent a much larger set of characters from many languages.
Historically, computers used 7-bit ASCII and then 8-bit bytes with parity for error checking. People added 8-bit extensions (like IBM code pages, PETSCII, and ISO 8859) to support more characters, but these were region-specific and didn’t always work together smoothly. Converting between different sets could be lossy or complicated.
In 1989, the Universal Character Set (UCS), now widely known as Unicode, was standardized to encode many more characters using 16-bit (2 bytes) or 32-bit (4 bytes) values. That’s when the idea of a wide character emerged: a datatype large enough to hold these bigger character values, separate from the actual character encoding used.
A wide character’s size refers to how much memory it uses, not to a specific encoding. Common encodings include UTF-8, UTF-16, and UTF-32, and they map characters to bytes in different ways. For example, UTF-8 uses multiple bytes for some characters, even though the program might store each character as a single wide value.
Different programming languages and systems handle wide characters in different ways. In C and C++, wchar_t is a wide character type, but its size is implementation-defined (often 16 bits on Windows and 32 bits on many Unix-like systems). Some languages add fixed 16-bit or 32-bit types, like char16_t and char32_t, for clear Unicode support. Because 16-bit wide characters can use surrogate pairs, they don’t always map one-to-one to a single Unicode character.
Historically, Windows tends to use “wide strings” for text, while many Unix-like systems use 8-bit narrow strings with UTF-8 as the standard way to represent wide characters when needed. Standard libraries provide tools to work with wide characters and strings, but the exact behavior depends on the platform.
Examples in modern languages include Python, where the approach to wide characters has evolved over time; Python 3 uses flexible Unicode storage and no longer relies on wchar_t as the core type. In Rust, a char is a 32-bit value representing a Unicode scalar.
In short, a wide character is about how much memory a single character needs, enabling support for many languages. The actual text encodings (like UTF-8, UTF-16, or UTF-32) determine how those characters are stored and transferred.
This page was last edited on 2 February 2026, at 16:48 (CET).