# Lab 4: Elections

STAT 20: Introduction to Probability and Statistics

# Benford’s Law

What is the distribution of city/town populations in all cities and towns in California?

What is the distribution of the first digit of city/town populations in all cities and towns in California?

## Benford’s Law

Observation: many naturally occurring numerical variables have a recurring pattern in the distribution of the first digit.

Benford’s law states that

• The first digit of the measurement of a naturally occurring phenomenon takes a decreasing log distribution.
• Thus the numbers 1-9 are not distributed uniformly. Instead, 1 has the highest proportion and 9 has the lowest proportion.
• For example, first digits of stock prices, populations of cities, and election results are observed to follow the Benford’s Law.

## Benford’s Law

Let $X$ be the first digit of a randomly selected number. $X \sim Benfords()$ if

$P(X = x) = \log_{10}\left(1 + 1/x \right)$

# 2009 Iran Election

## 2009 Iran Election

#### Background

• Ongoing public sentiment that previous election was fraudulent
• The highest voter turnout in Iran’s history

• Mir-Hossein Mousavi: Reformist and former prime minister. Seeking rapid political evolution.

## Post-election controversies and unrest

• Allegations of fraud
• Public protests and unrests
• The green wave movement, led by Mousavi, against the allegedly fraudulent election and Ahmadinejad’s regime

Was the election fraudulent?

# Benfords Law and Elections

## Fraud detection using Benford’s Law

#### Common Theory

In a normally occurring, fair, election, the first digit of the vote counts county-by-county should follow Benford’s Law. If they do not, that might suggest that vote counts have been manually altered.

. . .

This theory brought to bear to determine whether the 2009 presidential election in Iran showed irregularities1.

# US Elections Data

## GitHub

Statisticians, scientists, and engineers work on projects that include code, data, figures, and texts. For large-scale or long-run projects, we need a system to track and share everything.

#### What is Github

• A repository is like an online folder containing code, data, figures, presentations, papers, etc.
• A public repository allow everyone to access and download its content.

. . .

#### Examples

• The stat20data package has its code and data stored on GitHub here
• The OpenElections project

## The OpenElections Project

• Tracks official election results in every state of the US.
• Shares the data via GitHub.
data_frame <- read_csv("web link to raw data")
25:00