STAT 20: Introduction to Probability and Statistics

What is the distribution of city/town populations in all cities and towns in California?

What is the distribution of the **first digit** of city/town populations in all cities and towns in California?

**Observation**: many naturally occurring numerical variables have a recurring pattern in the distribution of the first digit.

Benford’s law states that

- The first digit of the measurement of a naturally occurring phenomenon takes a
**decreasing**log distribution.

- Thus the numbers 1-9 are not distributed uniformly. Instead, 1 has the highest proportion and 9 has the lowest proportion.

- For example, first digits of stock prices, populations of cities, and election results are observed to follow the Benford’s Law.

Let \(X\) be the first digit of a randomly selected number. \(X \sim Benfords()\) if

\[P(X = x) = \log_{10}\left(1 + 1/x \right)\]

- Ongoing public sentiment that previous election was fraudulent
- The highest voter turnout in Iran’s history

- Mahmoud Ahmadinejad: Leader of conservatives and incumbent president.
- Mir-Hossein Mousavi: Reformist and former prime minister. Seeking rapid political evolution.

Ahmadinejad won the election with 62.6% of the votes cast, while Mousavi received 33.75% of the votes cast.

- Allegations of fraud
- Public protests and unrests
- The green wave movement, led by Mousavi, against the allegedly fraudulent election and Ahmadinejad’s regime

Was the election fraudulent?

In a normally occurring, fair, election, the first digit of the vote counts county-by-county should follow Benford’s Law. If they do not, that might suggest that vote counts have been manually altered.

This theory brought to bear to determine whether the 2009 presidential election in Iran showed irregularities^{1}.

`get_first()`

`slice_sample()`

`pull()`

Statisticians, scientists, and engineers work on projects that include code, data, figures, and texts. For large-scale or long-run projects, we need a system to track and share everything.

- A repository is like an online folder containing code, data, figures, presentations, papers, etc.
- A public repository allow everyone to access and download its content.

- The
`stat20data`

package has its code and data stored on GitHub here - The OpenElections project

- Tracks official election results in every state of the US.
- Shares the data via GitHub.
- Can download data as CSV files.

Data from GitHub or other websites can be loaded into R like this:

So where to find the link to the raw data?

`25:00`