Understanding the World with Data

STAT 20: Introduction to Probability and Statistics

Agenda

  1. Introductions
  2. Data Science Lifecycle
    • What’s going on here?
  3. Why we’re here
    • Types of Claims
    • Data First vs Question First
  4. Course Structure
    • Flipped Classroom
    • Ed Discussion Forum
    • Syllabus

Introductions

  • In groups of 3, take turns introducing yourselves to one another by providing the info listed on the handout (your name, hometown, etc).

  • Each person should finish with a handout filled-in with info on their groupmates.

05:00

The Data Science Lifecycle

What’s going on here?

  • As a group, formulate at least three possible explanations for what’s going on in the picture.
  • Enter them at pollev.com or upvote existing explanations if they are very similar to your own.
05:00

Photo of shoebill with duck in its mouth, at an angle.

Does this image change which claims are more or less likely?

Up and down vote explanations at pollev.com. /

Understand
the World

Data

Understand
the World

Data

Why we’re here

To learn to critique and construct
claims made using data.

To learn to critique and construct
claims made using data.

To learn to critique and construct
claims made using data.

To learn to critique and construct
claims made using data.

To learn to critique and construct
claims made using data.

To learn to critique and construct
claims made using data.

To learn to critique and construct
claims made using data.

From Questions to Data

Is the incidence of COVID on campus going up or down?

Will this question be answered by a summary, a prediction, a generalization, or a causal claim?


Also discuss: what type of data can help answer this question? Consider:

  • Which different people / institutions collect relevant data
  • Is certain data not available? Why not?
06:00

From Data to Claims

One source of data:


“The following dashboard provides information on COVID-19 testing performed at University Health Services or through the PCR Home Test Vending Machines on campus. It does not capture self-reported positive tests. It provides a look at new cases and trends, at a glance.”

Formulate one claim that is supported by this data1.

03:00

Course Structure



  • Read lecture notes
  • Work through reading questions
  • Work through questions solo / in groups / as a class
  • Make progress on assignments

All of the materials and links for the course can be found at:

www.stat20.org

Syllabus

Take 4 minutes to read through the syllabus and jot down at least one question that you have.

www.stat20.org/syllabus.html

04:00

Ed Discussion Forum

  • Forum to ask questions, answer questions, and course announcements
  • Please answer each other’s questions!

Practice by asking/answering a question on the “Syllabus Discussion” thread on Ed via the link at the top right of https://www.stat20.org.

Looking forward

  • Read the lecture notes for “Taxonomy of Data”, posted Friday 5pm.
  • Leave a comment/question on “Taxonomy of Data” thread on Ed.
  • Answer the Reading Questions on Gradescope by 11:59 pm Monday
  • Problem Set 1, posted Friday 5pm, due next Tuesday 9 am
    • Stay tuned for Ed post with links to survey

Animated gif of a shoebill bird.

Making Claims with Data


A numerical, graphical, or verbal description of an aspect of data that is on hand.



Example
Using data from the Stat 20 class survey, the proportion of respondents to the survey who reported having no experience writing computer code is 70%.


A numerical, graphical, or verbal description of a broader set of units than those on which data was been recorded.



Example
Using data from the Stat 20 class survey, the proportion of Berkeley students who have no experience writing computer code is 70%.


A claim that changing the value of one variable will influence the value of another variable.



Example
Data from a randomized controlled experiment shows that taking a new antibiotic eliminates more than 99% of bacterial infections.


A guess about the value of an unknown variable, based on other known variables.



Example
Based on reading the news and the price of Uber’s stock today, I predict that Uber’s stock price will go up 1.2% tomorrow.