Overview

In statistics, a prediction interval (PI) is an interval estimate that expresses uncertainty about where a single future observation or a small set of new observations is likely to lie, given data already observed. Unlike intervals that quantify uncertainty in parameters, such as confidence intervals, prediction intervals quantify uncertainty in outcomes. They are widely used in forecasting, regression modeling and any application where one must anticipate individual future values.

Characteristics and basic idea

A prediction interval is usually given as "lower bound to upper bound with probability 1−α" for a specified coverage probability (for example, 95%). Conceptually it combines two uncertainty sources: uncertainty about the estimated model (parameter uncertainty) and the natural random variation of individual observations (irreducible noise). Because it includes observational variability, a PI is typically wider than a confidence interval for the same model and level.

Calculation and common forms

Under simple parametric assumptions — for example, a linear regression model with independent, normally distributed errors and constant variance — the prediction interval around a predicted value ŷ at a predictor x0 has the form ŷ ± t_{α/2, df} × s_pred. The prediction standard error s_pred accounts for both the estimated regression uncertainty and the residual variability for a new observation, and therefore depends on sample size and the location of x0 relative to the original data. When model assumptions do not hold, practitioners often use nonparametric approaches such as bootstrap prediction intervals, which resample residuals or cases to approximate the sampling distribution of future observations.

Assumptions and methods

  • Typical assumptions: independence of observations, correct model specification, homoscedasticity (constant variance) and, for classical formulas, normally distributed errors.
  • When assumptions are doubtful: use robust regression variants, prediction bands that allow heteroscedasticity, or bootstrap / simulation methods to construct intervals.
  • For time series, prediction intervals must account for serial dependence; methods include ARIMA prediction intervals and simulation-based approaches.

Uses, examples and importance

Prediction intervals are used wherever forecasts for individual outcomes matter: estimating the range for next month's sales, predicting a single patient's response in a clinical setting, or reporting uncertainty around a single forecast in weather and economic models. They communicate practical uncertainty to decision makers — for instance, inventory planners need a PI for individual demand, not just the mean forecast.

Distinctions and common pitfalls

Key distinctions: confidence intervals target parameter uncertainty (e.g., the mean), while prediction intervals target future observations. Misinterpretation is common: a 95% PI does not guarantee that 95% of future individual observations will fall in the interval for every dataset — it means the procedure, repeated many times under the same model, will cover the new observation about 95% of the time. Users should also be cautious about extrapolating PIs far beyond the range of the original data, where model error and bias can dominate.

Further technical resources and applied examples are available through general statistical references and modeling guides (see statistics resources or prediction interval tutorials).