Percentile
This article deals with quantiles of samples. For quantiles of probability distributions or random variables, see Quantile (probability theory).
An empirical ( -)quantile, also called simply quantile for short, is a ratio of a sample in statistics. For any number between 0 and 1, an empirical divides the sample such that a proportion of the sample of smaller than the empirical and a proportion of the sample is larger than the empirical quantile. For example, given a sample of shoe sizes, the empirical 0.35 quantile is that shoe size such that 35% of the shoe sizes in the sample are smaller than and 65% are larger than
Some empirical quantiles have proper names. They include the median ( ), the upper quartile and the lower quartile as well as the terciles, quintiles, deciles and percentiles.
Quantiles (in the sense of probability theory) are to be distinguished from the empirical quantiles discussed here. These are ratios of a probability distribution and thus of an abstract (quantity) function (similar to the expected value), whereas the empirical quantiles are ratios of a sample (similar to the arithmetic mean).
Definition
Let ⌊ denote the rounding function. It rounds each number to the nearest smaller integer. For example, ⌊ and ⌊ .
Given a sample size , whose elements are ordered by size. This means that
.
Then, for a number
the empirical -quantile of .
Some definitions exist that differ from the definition given here.
Example
The following sample consists of ten random integers (drawn from the numbers between zero and one hundred, fitted with the discrete uniform distribution):
Sort provides the sample
.
It is .
For we get . Since this is an integer, we obtain via the definition
For we get . The rounding function then yields ⌊ and thus
.
Similarly, for directly obtain and thus ⌊ , so
.
In contrast to the arithmetic mean, the empirical quantile is robust to outliers. This means that if values of a sample above (or below) a certain quantile are replaced by a value above (or below) the quantile, the quantile itself does not change. This is based on the fact that quantiles are determined only by their order and thus their position with respect to each other, and not by the concrete numerical values of the sample. Thus, in the case of the sample above, the arithmetic mean would be . However, if we now modify the largest value of the sample, we set for example
,
so , whereas the median, lower quartile, and upper quartile remain unchanged because the order of the sample has not changed.
Special quantiles
For certain values, the associated quantiles have proper names. They are briefly introduced in the following. It should be noted that the corresponding quantiles of probability distributions are also partly designated with the same proper names.
Median
→ Main article: Median
The median is the quantile and thus divides the sample into two halves: one half is smaller than the median, the other larger than the median. Together with the mode and the arithmetic mean, it is an important location parameter in descriptive statistics.
Terzil
The two -quantiles for and called terciles. They divide the sample into three equal parts: one part is smaller than the lower tercile (= {\tfrac -quantile), one part is larger than the upper tercile (= -quantile), and one part lies between the terciles.
Quartile
The two quantiles with and called quartiles. Here, the quantile is called the lower quartile and the quantile is called the upper quartile. Half of the sample lies between the upper and lower quartiles, and a quarter of the sample lies below the lower quartile and above the upper quartile, respectively. The interquartile range, a measure of dispersion, is defined on the basis of the quartiles.
Quintile
Quintiles are the four quantiles with Accordingly, 20% of the sample is below the first quintile and 80% above it, 40% of the sample is below the second quintile and 60% above it, and so on.
Decile
The quantiles for multiples of , i.e. for are called deciles. Here, the quantile is called the first decile, the quantile is called the second decile, etc. Below the first decile are 10% of the sample, above correspondingly 90% of the sample. Similarly, 40% of the sample lies below the fourth decile and 60% above.
Percentile
Percentiles are the quantiles from to in steps of .
Derived terms
From the quantiles, certain measures of dispersion can be derived. The most important is the interquartile range.
.
It indicates how far apart the upper and lower quartiles are and thus how wide the range is in which the middle 50% of the sample lies. Somewhat more generally, the (inter)quantile distance can be defined as for . It indicates how wide is the range in which the mean the sample lies. For it corresponds to the interquartile range.
Another derived measure of dispersion is the mean absolute deviation from the median.
View
One way of displaying quantiles is the box plot. Here, the entire sample is represented by a box - provided with two antennas. The outer boundaries of the box are the upper and lower quartiles. Thus, half of the sample is in the box. The box itself is subdivided again, the subdividing line is the median of the sample. The antennas are not uniformly defined. One possibility is to choose the first and the ninth decile as the limits of the antennas.
Box plot of a sample