Lab 4: Elections

STAT 20: Introduction to Probability and Statistics

Benford’s Law

What is the distribution of city/town populations in all cities and towns in California?

What is the distribution of the first digit of city/town populations in all cities and towns in California?

Benford’s Law

Observation: many naturally occurring numerical variables have a recurring pattern in the distribution of the first digit.

Benford’s law states that

  • The first digit of the measurement of a naturally occurring phenomenon takes a decreasing log distribution.
  • Thus the numbers 1-9 are not distributed uniformly. Instead, 1 has the highest proportion and 9 has the lowest proportion.
  • For example, first digits of stock prices, populations of cities, and election results are observed to follow the Benford’s Law.

Benford’s Law

Let \(X\) be the first digit of a randomly selected number. \(X \sim Benfords()\) if

\[P(X = x) = \log_{10}\left(1 + 1/x \right)\]

2009 Iran Election

2009 Iran Election


  • Ongoing public sentiment that previous election was fraudulent
  • The highest voter turnout in Iran’s history

Leading candidates

  • Mahmoud Ahmadinejad: Leader of conservatives and incumbent president.
  • Mir-Hossein Mousavi: Reformist and former prime minister. Seeking rapid political evolution.


Ahmadinejad won the election with 62.6% of the votes cast, while Mousavi received 33.75% of the votes cast.

Post-election controversies and unrest

  • Allegations of fraud
  • Public protests and unrests
  • The green wave movement, led by Mousavi, against the allegedly fraudulent election and Ahmadinejad’s regime

Was the election fraudulent?

Benfords Law and Elections

Fraud detection using Benford’s Law

Common Theory

In a normally occurring, fair, election, the first digit of the vote counts county-by-county should follow Benford’s Law. If they do not, that might suggest that vote counts have been manually altered.

. . .

This theory brought to bear to determine whether the 2009 presidential election in Iran showed irregularities1.





US Elections Data


Statisticians, scientists, and engineers work on projects that include code, data, figures, and texts. For large-scale or long-run projects, we need a system to track and share everything.

What is Github

  • A repository is like an online folder containing code, data, figures, presentations, papers, etc.
  • A public repository allow everyone to access and download its content.

. . .


  • The stat20data package has its code and data stored on GitHub here
  • The OpenElections project

The OpenElections Project

  • Tracks official election results in every state of the US.
  • Shares the data via GitHub.
  • Can download data as CSV files.

Access OpenElections data

Data from GitHub or other websites can be loaded into R like this:

data_frame <- read_csv("web link to raw data")

So where to find the link to the raw data?