Overview
The Extensible Markup Language (XML) is a text-based format for encoding structured information in a way that both humans and programs can read. Developed under the World Wide Web Consortium, XML provides a set of rules for creating custom markup vocabularies: tags, attributes and a hierarchical element structure. Unlike presentation languages, XML describes the meaning and organization of data rather than how it should appear.
Structure and core characteristics
XML documents are composed of a prolog and a tree of elements. Basic characteristics include case-sensitive tag names, nested elements, attributes, comments and processing instructions. A minimal XML fragment looks like a root element containing nested child elements and text. XML is self-describing: element names convey context about the enclosed data rather than styling it.
Validation and related languages
To express constraints on acceptable documents, XML uses schema languages. The original form was the Document Type Definition (DTD). Later, more expressive systems appeared such as XML Schema (XSD) and RELAX NG. XPath, XQuery and XSLT provide ways to select, query and transform XML content. Namespaces allow different vocabularies to coexist without name collisions.
Processing and APIs
Programs read and write XML using parsers and APIs. Common processing models include tree-based APIs (DOM), event-driven parsers (SAX) and streaming interfaces (StAX). These tools let applications extract values, validate structure against a schema, or transform data for other systems. Many programming ecosystems provide built-in XML support and libraries to simplify these tasks.
Uses and examples
- Configuration files and settings for desktop or server software.
- Document formats such as office file containers and vector graphics (e.g., SVG).
- Data interchange in web services (SOAP) and syndication formats (RSS, Atom).
- Intermediate formats for transformation pipelines using XSLT and XPath.
Distinctions and notable facts
XML is often compared with HTML, which is a fixed vocabulary focused on web presentation; XML is extensible and focuses on semantics. It is widely supported, platform-independent and human-readable, though more verbose than binary formats. For authoritative specifications and guidance see the standards maintained by the W3C, or consult developer documentation and tutorials at XML resources.