STAT 20: Introduction to Probability and Statistics

What is the distribution of city/town populations in all cities and towns in California?

What is the distribution of the **first digit** of city/town populations in all cities and towns in California?

**Observation**: many naturally occurring numerical variables have a recurring pattern in the distribution of the first digit.

Benford’s law states that

- The first digit of the measurement of a naturally occurring phenomenon takes a
**decreasing**log distribution.

- Thus the numbers 1-9 are not distributed uniformly. Instead, 1 has the highest proportion and 9 has the lowest proportion.

- For example, first digits of stock prices, populations of cities, and election results are observed to follow the Benford’s Law.

Let \(X\) be the first digit of a randomly selected number. \(X \sim Benfords()\) if

\[P(X = x) = \log_{10}\left(1 + 1/x \right)\]

- Ongoing public sentiment that previous election was fraudulent
- The highest voter turnout in Iran’s history

- Mahmoud Ahmadinejad: Leader of conservatives and incumbent president.
- Mir-Hossein Mousavi: Reformist and former prime minister. Seeking rapid political evolution.

Ahmadinejad won the election with 62.6% of the votes cast, while Mousavi received 33.75% of the votes cast.

- Allegations of fraud
- Public protests and unrests
- The green wave movement, led by Mousavi, against the allegedly fraudulent election and Ahmadinejad’s regime

Was the election fraudulent?

In a normally occurring, fair, election, the first digit of the vote counts county-by-county should follow Benford’s Law. If they do not, that might suggest that vote counts have been manually altered.

. . .

This theory brought to bear to determine whether the 2009 presidential election in Iran showed irregularities^{1}.

`get_first()`

`slice_sample()`

`pull()`

Statisticians, scientists, and engineers work on projects that include code, data, figures, and texts. For large-scale or long-run projects, we need a system to track and share everything.

- A repository is like an online folder containing code, data, figures, presentations, papers, etc.
- A public repository allow everyone to access and download its content.

. . .

- The
`stat20data`

package has its code and data stored on GitHub here - The OpenElections project

- Tracks official election results in every state of the US.
- Shares the data via GitHub.
- Can download data as CSV files.

Data from GitHub or other websites can be loaded into R like this:

So where to find the link to the raw data?

`25:00`