If you test this hypothesis on real world data, though, you find that the probability that the first digit is a 1 is actually about 30.1%, the probability that the first digit is a 2 is about 17.6%, the probability that the first digit is a 3 is 12.4%, and the probabilities keep dropping so the probability that the leading digit is a 9 is only 4.5%, as the following graph illustrates.
This distribution fits the rule that the probability that the first digit is d is
pd = log10(1 + 1/d)
This distribution is called Benford's Law after physicist Frank Benford who discovered it in 1938. Benford wasn't the first to notice the distribution. Astronomer and mathematician Simon Newcomb made the same discovery 57 years earlier when he noticed that the earlier pages of logarithm tables were dirtier and more worn than the later pages.
Benford tested thousands of data sets for the distribution of leading digits, including geographic data, physical properties of chemicals, baseball statistics, and street addresses. He found the same pattern repeated in all of these seemingly unrelated sets of data.
The following graph shows that the first digits of recent stock prices also closely resemble Benford's distribution.
Distributions that cover several orders of magnitude are likely to satisfy Benford's Law very closely. This is the case for sets of numbers that are allowed to grow exponentially, such as individual incomes and stock prices.
Another property of Benford's Law is that it is scale invariant. This means that the leading digits of the stock data above, for example, would follow Benford's Law even if it were converted to other currencies, such as Euros, or Japanese Yen.
In addition to being scale invariant, Benford's Law can be shown to be base invariant as well. If you convert the values from a set of data that adheres to Benford's Law to any other base system, the new data set will continue to adhere to the law, with one slight modification. The probability distribution of the digits of the new base can be calculated with the following adjusted formula:
pd = logbase(1 + 1/d)
where d takes the value of each non-zero digit of the new base.
Distributions that have a built in minimum or maximum value are unlikely to satisfy Benford's Law. For example, one might expect a set of numbers representing "small insurance claims" to obey the law. However, if "small" in this instance is defined to be between $50 and $100, then some leading digits are excluded from the range by definition.
Distributions that cover only one or two orders of magnitude, or even fewer, are also unlikely to satisfy Benford's Law. Adult IQ scores are an example of a set of data that covers a relatively narrow range, despite having no theoretical maximum.
A simple explanation of Benford's Law can be found in inflationary price increases. The price of an item that starts out at a cost of $1.00 with a steady 3% rate of annual inflation will have a leading digit of 1 for 24 years, until the price reaches $2.03 in the 25th year. The leading digit will be 2 for the following 14 years, 3 for the next 9 years, 4 for 8 years, 5 for 6 years...and 9 for only 3 years before the price reaches $10.03 in the 79th year. After that, the leading digit will be a 1 again for another 24 years.
When you take into account the fact that inflation affects a wide range of consumer goods, you can see why, if you take a sample of all of the prices in your local grocery store at one specific moment in time, the prices as a group adhere to Benford's Law. Every item in the store has been going through its own exponential price increase over time, so the probability of a randomly selected item's price having a leading digit of 1 at the particular moment that you sample those prices will be roughly 30.1%.
Benford's Law may seem like a mere mathematical curiosity, but it has some surprisingly pragmatic applications. Based on the assumption that people who attempt to falsify data will tend to distribute digits uniformly, Benford's Law can be used to expose potential fraud in accounting, insurance claims, and tax forms. Other uses, such as analyzing the results of clinical trials and election results, have also been proposed.
For more information on applications of Benford's Law, see I've Got Your Number: How a mathematical phenomenon can help CPAs uncover fraud and other irregularities.