Measurements are never perfectly exact, and the gap between what we observe and what is true is central to statistics. Practical measurement produces data that contain variability for many reasons: natural variation, limitations of instruments, and sampling procedures. Understanding that variability—expressed as errors and residuals—lets analysts estimate uncertainty, check models, and make informed decisions.
Core definitions and the essential distinction
In statistical usage the term "error" commonly denotes the unobservable difference between an individual observation and the unknown true quantity or value implied by a population model. A "residual" is the corresponding observable quantity computed from a fitted model or a sample: it is the difference between the observed value and an estimated or predicted value. Thus errors are theoretical and generally unobservable; residuals are empirical and used to assess models.
Types and characteristic properties
- Sampling error: variation that arises because a sample is only a subset of a population. For a sample mean the deviation from the population mean is a sampling error.
- Measurement error: inaccuracies introduced by instruments or protocols during measurement.
- Model error: discrepancy between a model's predictions and the true data generating process; this is often the target when estimating parameters.
- Systematic vs random: systematic errors bias results in one direction; random errors have no fixed sign and can cancel on average.
When a random process or random variable is involved, the mean or central tendency of the population (population mean) is a theoretical benchmark; the sample mean (sample mean) is the estimator. The difference between an individual observation and the population mean is an error; the difference between the observation and the sample mean is a residual.
Simple example
Imagine an experiment measuring adult heights drawn from a local population. If the true population average height is unknown (but the model posits a central value under the assumed distribution), any single person's measured height minus that true average is a statistical error. If we instead compute the average from a sample of people and subtract that sample mean from each measured height, those differences are residuals. Residuals within a simple random sample sum to zero by construction, whereas true errors need not sum to zero because the population mean is not used to center them. This contrast explains why residuals can display dependencies introduced by estimation even when underlying errors are independent.
Why the distinction matters in practice
Residuals are the practical tool for diagnostics: plotting residuals versus fitted values, checking for patterns, and testing for unequal variance (heteroskedasticity) or serial correlation. Analysts compute residual-based measures—such as residual standard error or sums of squared residuals—to compare models and quantify fit. Because residuals are computed from estimated parameters, they inherit constraints (for example, in ordinary least squares the residuals sum to zero) which affect their statistical properties and the interpretation of tests.
Theoretical context and historical notes
Estimators and their residuals are central to regression methods that date back to the early development of least squares. Under assumptions formalized in results such as the Gauss–Markov theorem, certain estimators (and thus their induced residuals) enjoy optimality properties: they are best linear unbiased estimators of coefficients and their residuals carry information about remaining unexplained variation. Because errors are conceptual and residuals observable, many inferential procedures focus on residuals to assess whether modelling assumptions appear reasonable given the collected data.
Practical reminders and common pitfalls
- Do not treat residuals as if they were independent draws of the true errors; estimation constraints can induce dependence.
- Differentiate between bias (systematic error) and random variability; both affect conclusions but require different remedies.
- Use graphical checks and formal tests on residuals before trusting standard errors and confidence intervals.
In summary, the statistical "error" is the theoretical departure from an unknown truth, while the "residual" is what we compute from observed values and fitted models to assess that departure. Knowing the distinction helps analysts design better studies, choose appropriate estimators, and use residual diagnostics to improve models. For further reading on these concepts consult introductory texts on estimation and regression or follow specific practice-oriented guides linked here: difference, observed value, population.

