Overview
Data are individual facts, measurements or symbols collected for reference, analysis or processing. In everyday use the term covers numeric values, written text, images, sounds and other representations that can be stored and examined. Data can be raw (unprocessed) or processed into summaries and insights. Metadata — information that describes other data — helps locate, interpret and manage data resources.
Types and formats
Data come in many forms and can be grouped by type and structure. Common categories include:
- Numeric data: numbers used for counting, measuring or computing — for example, statistics and sensor readings. More on numeric forms
- Textual data: words and characters found in documents, logs and messages. Related text examples
- Audio and multimedia: recorded sound, photographs and video stored as files or streams. Multimedia references
- Metadata: descriptors such as creation date, author, file format and keywords that explain or index other data. About metadata
Structure and characteristics
Data may be structured (organized into fields and records, as in a spreadsheet or database), semi-structured (marked up with tags or keys) or unstructured (free text, images, audio). Important attributes include accuracy, precision, completeness, timeliness and consistency. Data quality affects the reliability of any conclusions drawn from analysis.
History and usage of the word
The term has Latin roots: data is the plural of datum, from a verb meaning "to give." In modern English usage, however, data is commonly treated as an uncountable mass noun ("the data is") as well as a plural noun ("the data are"). The original form datum is now used mainly in technical contexts to mean a single piece of information or a reference point.
Uses, importance and examples
Data underpins scientific research, business intelligence, engineering, public policy and everyday technologies such as search engines and navigation systems. Examples include medical measurements used for diagnosis, sales figures for forecasting, and sensor logs that enable monitoring and automation. Large collections of data, often referred to as datasets or "big data," enable pattern discovery and machine learning but also require specialized tools for storage and analysis.
Management, privacy and notable distinctions
Managing data involves collection, validation, storage, retrieval, sharing and disposal. Databases and file systems provide structured access, while formats and standards support interoperability. Data governance addresses ownership, access rules, privacy and security. Distinctions often made in practice include raw versus processed data, qualitative versus quantitative data, and public versus sensitive data, each requiring different handling and safeguards. Etymology and usage note