A character in computing is an abstract unit of text: a letter, digit, punctuation mark, symbol or an invisible control element that software recognizes and manipulates. Characters themselves are conceptual items that represent meaning in text, while their visual appearance is produced by a glyph. Computers store and process characters as numeric values, and those numeric values are interpreted according to a text encoding.
Core concepts and terms
- Character: the abstract element of written language (for example, the letter A or the digit 5). See basic examples: letters and digits.
- Glyph: the particular visual shape used to display a character on screen or paper; glyphs can vary with font or style: glyph.
- Code point / numeric value: the number assigned to a character in an encoding system; computers operate on these numbers rather than on visible shapes. For the idea of numbers used internally, see numbers and numeric codes.
- Encoding: the set of rules that map characters to numeric values and to bytes for storage and transmission; common examples are ASCII and Unicode encodings such as UTF-8.
- Grapheme cluster: a user-perceived character that may consist of a base character plus combining marks (for example, a letter plus an accent).
For example, in the ASCII encoding the number 65 corresponds to the capital letter 'A'. When text is rendered, the system looks up a glyph for that character in a font and draws it; the same code point can appear differently in a serif or sans-serif face. For more on fonts and appearance, see font materials.
Control characters and invisible symbols
Not all characters are visible. Control characters are special codes that request actions from software or devices rather than representing printable marks. Familiar control characters include carriage return and line feed, which move the cursor to a new line, or tab characters that advance to a column stop. Software and protocols interpret these codes to manage layout and data flow; see control examples: control characters.
History, development and encodings
Early computer systems used limited sets such as ASCII to represent common English letters, digits and punctuation. As computing became global, Unicode was developed to assign code points for virtually every writing system and many symbols. Unicode separates the abstract code point (the character identifier) from its encoded byte representation (UTF-8, UTF-16, etc.), enabling consistent interchange among systems. UTF-8 in particular encodes Unicode code points as variable-length byte sequences so text can be stored and transmitted efficiently.
Practical implications and distinctions
- Bytes versus characters: a character encoding determines how many bytes represent one character; some encodings use one byte per character, others use variable-length sequences.
- Normalization: the same visual text can be encoded in different ways (precomposed characters vs. base+combining marks), so software often normalizes text before comparing or storing it.
- Rendering pipeline: text processing typically goes from code points to glyph selection to layout to rasterization, with each step influenced by locale, font, and rendering engine.
- Interoperability: choosing the right encoding and handling control characters appropriately are essential for reliable file formats, network protocols and user interfaces.
Understanding the distinction among characters (the abstract unit), code points (their numeric identifiers), glyphs (the visual forms), and encodings (how code points become bytes) is fundamental to text processing, localization, and software that works with human language.
Further reading and resources: letters and digits, control characters, glyph, numbers, numeric codes, ASCII, font.