Confidence interval

Author: Leandro Alegsa

06-09-2021 16:21

A confidence interval, or CI, (also called confidence interval, confidence interval, or expected range) is, in statistics, an interval intended to indicate the precision of the lag estimate of a parameter (e.g., a mean). The confidence interval indicates the range that includes with a certain probability (the confidence level) the parameter of a distribution of a random variable.

If a random experiment is repeated many times in an identical manner, the frequently chosen 95% confidence interval will contain the fixed, unknown, "true" parameter 95% of the time.

The frequently encountered formulation that the true value lies with 95 % probability in the currently available confidence interval is, strictly speaking, incorrect, since the true value is fixed and does not behave stochastically. The correct formulation would be: the true value lies for 95 % of all samples in the respective calculated confidence interval, because the upper and lower limits of the confidence interval depend on the present sample and are therefore random variables, i.e. stochastic. An alternative correct formulation is: when calculating a confidence interval, its interval boundaries enclose the true parameter 95% of the time and not 5% of the time. The confidence interval is constructed such that the true parameter β is $\beta_j$ $1-\alpha$ covered with probability when the estimation procedure is repeated for many samples, where α $\alpha$ the so-called level of error.

Estimating parameters using confidence intervals is called interval estimation, and the corresponding estimator is called a range or interval estimator. One advantage over point estimators is that a confidence interval can be used to read significance directly: a wide interval for a given confidence level indicates a small sample size or strong variability in the population.

Confidence intervals are distinguished from forecast intervals and confidence and forecast bands.

Confidence intervals at the 95% level for 100 samples of size 30 from a normally distributed population. Of these, 94 intervals cover the exact expected value μ = 5; the remaining 6 do not.

Definition

For a fixed given γ $\gamma \in (0,1)$ , a γ $\gamma \cdot 100\,\%$ -confidence interval for $\vartheta$ to the confidence level γ $\gamma$ (also: a γ $\gamma$ -confidence interval) by the two - based on a random sample $X_{1:n}$ - based on a random sample $T_{u}=h_{u}(X_{1:n})$ and $T_{v}=h_{v}(X_{1:n})$ which are

$P\left(T_{u}\leq \vartheta \leq T_{v}\right)=\gamma \quad \mathrm {f{\ddot {u}}r\;alle\;} \vartheta \in \Theta$

satisfy. The statistics $T_{u}$ and $T_{v}$ are the bounds of the confidence interval, for which always $T_{u}<T_{v}$ assumed. The confidence level γ $\gamma$ is also called the coverage probability. The realizations $t_{u}$ and $t_{v}$ of and $T_{u}$ respectively, $T_{v}$ form the estimation interval $[t_{u},t_{v}]$ . The bounds of the confidence interval are functions of the random sample $X_{1:n}$ and therefore also random. In contrast, the unknown parameter is $\vartheta$ fixed. Repeating the random experiment in an identical way, a γ -confidence interval $\gamma \cdot 100\,\%$ will cover the unknown parameter $\vartheta$ in γ $\gamma \cdot 100\,\%$ all cases. However, since the unknown parameter $\vartheta$ is not a random variable, one cannot say that $\vartheta$ $\gamma$ lies in a γ $\gamma \cdot 100\,\%$ -confidence interval with probability γ Such an interpretation is reserved for the Bayesian counterpart of confidence interval, called credibility intervals. The confidence level γ $\gamma$ is also called the coverage probability. Often one sets γ $\gamma =1-\alpha$ . The probability $1-\alpha$ can be interpreted as a relative frequency: If one uses intervals for a large number of confidence estimates, each of which $1-\alpha$ has the level the relative frequency with which the concrete intervals cover the parameter approaches the value $1-\alpha$ .

Formal definition

General conditions

Given a statistical model $(X,{\mathcal A},(P_{\vartheta })_{{\vartheta \in \Theta }})$ and a function

$g\colon \Theta \to \Gamma$ ,

which is also called parameter function in the parametric case. The set Γ $\Gamma$ contains the values that can be the result of an estimation. Usually, Γ $\Gamma \subset \mathbb {R} ^{n}$

Confidence interval

An illustration

$C\colon X\to {\mathcal {P}}(\Gamma )$

is called a confidence interval, confidence region, range estimator, or range estimator if it satisfies the following condition:

For all γ $\gamma \in \Gamma$ , the set $A(\gamma ):=\{x\in X\mid \gamma \in C(x)\}$ $\mathcal A$ contained in (M)

Thus, a confidence region is a mapping that assigns to each observation $\Gamma$ an $x\in X$ initially arbitrary subset of Γ ( ${\mathcal {P}}(\Gamma )$ is here the power set of the set Γ $\Gamma$ , that is, the set of all subsets of Γ $\Gamma$ )

Condition (M) ensures that all sets can be assigned $A(\gamma )$ a probability. This is needed to define the confidence level.

Confidence interval

If Γ $\Gamma \subset \mathbb {R}$ and if $C(x)$ $x\in X$ always an interval for any , then also called a confidence interval.

If confidence intervals are defined in the form

$C_{1}(x)=(-\infty ,b^{+}(x)],\;C_{2}(x)=[b^{-}(x),b^{+}(x)]\;\;{\text{oder}}\;\;C_{3}(x)=[b^{-}(x),+\infty )$ ,

is defined, then $b^{+}(x)$ also called the upper confidence bound and $b^{-}(x)$ the lower confidence bound.

Confidence level and level of error

Given a confidence region . Then is called a confidence region at the confidence level or certainty level $1-\alpha$ if

$P_{\vartheta }(\{x\in X\mid g(\vartheta )\in C(x)\})\geq 1-\alpha \quad \mathrm {f{\ddot {u}}r\;alle\;} \vartheta \in \Theta$ .

The value α $\alpha$ is then also called the level of error. A more general formulation is possible with shape hypotheses (see Shape Hypotheses#Confidence Ranges for Shape Hypotheses).

For the above-mentioned special cases with confidence intervals with upper and lower confidence limits, the following is thus obtained

$P_{\vartheta }(g(\vartheta )\leq b^{+}(x))\geq 1-\alpha$

respectively

$P_{\vartheta }(b^{-}(x)\leq g(\vartheta )\leq b^{+}(x))\geq 1-\alpha$

and

$P_{\vartheta }(b^{-}(x)\leq g(\vartheta ))\geq 1-\alpha \quad \mathrm {f{\ddot {u}}r\;alle\;} \vartheta \in \Theta .$

Questions and Answers

Q: What is a confidence interval in statistics?

A: A confidence interval is a special interval used to estimate a parameter, such as the population mean, giving a range of acceptable values for the parameter instead of a single value.

Q: Why is a confidence interval used instead of a single value?

A: A confidence interval is used instead of a single value to account for the uncertainty of estimating a parameter based on a sample, and to give a likelihood that the real value of the parameter is within the interval.

Q: What is a confidence level?

A: A confidence level is the likelihood that the parameter being estimated is within the confidence interval, and is often given as a percentage (e.g. 95% confidence interval).

Q: What are confidence limits?

A: Confidence limits are the end points of a confidence interval, which define the range of acceptable values for the parameter being estimated.

Q: How does the confidence level affect the confidence interval?

A: In a given estimation procedure, the higher the confidence level, the wider the confidence interval will be.

Q: What assumptions are required to calculate a confidence interval?

A: The calculation of a confidence interval generally requires assumptions about the nature of the estimation process, such as the assumption that the distribution of the population from which the sample came is normal.

Q: Are confidence intervals robust statistics?

A: Confidence intervals, as discussed below, are not robust statistics, though adjustments can be made to add robustness.

Search within the encyclopedia

Confidence interval

Definition

Formal definition

General conditions

Confidence interval

Confidence interval

Confidence level and level of error

Questions and Answers

Q: What is a confidence interval in statistics?

Q: Why is a confidence interval used instead of a single value?

Q: What is a confidence level?

Q: What are confidence limits?

Q: How does the confidence level affect the confidence interval?

Q: What assumptions are required to calculate a confidence interval?

Q: Are confidence intervals robust statistics?

Search by letter