Overview
A formal language is a collection of finite sequences (called strings) formed from a specified finite set of symbols called an alphabet. In mathematics, computer science and linguistics the concept provides a rigorous way to describe expressions, programs, proofs and simplified models of natural languages. Notation often treats a language as a variable such as L or \u2112, emphasizing that it is a set of strings rather than a single formula; see also the treatment of variables in formal systems.
Structure and key characteristics
Every formal language is built on an alphabet and formation rules. Important components and properties include:
- Alphabet: a finite set of symbols from which strings are formed.
- Strings: finite sequences of symbols; the empty string is usually included.
- Grammar or generation rules: syntactic rules that determine which strings belong to the language.
- Semantics: an optional mapping that gives meaning to strings when the language models a system (for example, a programming language or a logical calculus).
Types and classification
Formal languages are organized by the kinds of grammars or machines that generate or recognize them. Classical categories include regular, context-free, context-sensitive and recursively enumerable languages. This classification underlies automata theory and the Chomsky hierarchy, which links grammars to computational models such as finite automata, pushdown automata and Turing machines.
History and development
The idea of formal languages emerged as mathematicians and logicians sought precise symbolic systems to avoid ambiguity and paradox. Developments in the mid‑20th century connected grammar-based descriptions with abstract machines, giving rise to modern formal language theory and its use in programming language design and compiler construction. Formal languages have roots in work on logic, computability and the mathematical study of syntax.
Uses, examples and importance
Formal languages are central to many fields. Examples and applications include:
- Specification and implementation of programming languages and compilers.
- Modeling syntactic structures in natural language research and computational linguistics.
- Defining formal logics and proof systems used in logical reasoning and verification.
- Describing searchable or decidable pattern sets in tools such as regular expressions.
Distinctions and notable facts
Formal languages differ from natural languages chiefly in precision: symbols and formation rules are explicitly specified so that strings have unambiguous syntactic status and, when provided, clear semantics. This precision reduces or eliminates the kinds of ambiguity common in everyday speech, though some formal grammars can still admit ambiguous derivations. The study of which language properties are decidable or efficiently decidable is a major thread in theoretical computer science and logic.