Zipf's law is an empirical law, formulated using mathematical statistics, named after the linguistGeorge Kingsley Zipf, who first proposed it. Zipf's law states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table. So nth word has a frequency proportional to 1/n.
Thus the most frequent word will occur about twice as often as the second most frequent word, three times as often as the third most frequent word, etc. For example, in one sample of words in the English language, the most frequently occurring word, "the", accounts for nearly 7% of all the words (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only about 135 words are needed to account for half the sample of words in a large sample.
The same relationship occurs in many other rankings, unrelated to language, such as the population ranks of cities in various countries, corporation sizes, income rankings, etc. The appearance of the distribution in rankings of cities by population was first noticed by Felix Auerbach in 1913.
It is not known why Zipf's law holds for most languages.
Questions and Answers
Q: What is Zipf's law?
A: Zipf's law is an empirical law that states that the frequency of a word in a large sample is inversely proportional to its rank in the frequency table.
Q: Who proposed Zipf's law?
A: Zipf's law was first proposed by George Kingsley Zipf, a linguist.
Q: How does Zipf's law explain word frequency in a sample of English words?
A: According to Zipf's law, the most frequent word in a sample of English words occurs about twice as often as the second most frequent word, three times as often as the third most frequent word, etc. This trend continues as the rank of the word decreases.
Q: What percentage of all words does the most frequently occurring word account for in one sample of English words?
A: In one sample of English words, the most frequently occurring word ("the") accounts for nearly 7% of all the words.
Q: What is the relationship between the number of words needed to account for half the sample and the frequency of those words?
A: According to Zipf's law, only about 135 words are needed to account for half the sample of words in a large sample.
Q: What other rankings exhibit Zipf's law?
A: The same relationship that Zipf's law describes in frequency of words occurs in other rankings unrelated to language, such as the population ranks of cities in various countries, corporation sizes, and income rankings.
Q: Who noticed the appearance of the distribution in rankings of cities by population?
A: The appearance of the distribution in rankings of cities by population was first noticed by Felix Auerbach in 1913.