Stat 20: Introduction to Probability and Statistics


Andrew Bray

Silas Gifford

Jeremy Sanchez

Shobhana Murali Stoyanov

Welcome to the Age of Data, where claims made using data are all around us: in the news, in the pages of scientific journals, in the policies of government, and in the board rooms of companies across the world. In this course you will explore the forms of claims that are made using data. Some of these are subtle claims about the structure of the data at hand. Others are grand claims about scientific truths or predictions of what will happen in the future. This course will train your ability to critique and construct such arguments made using data.

Course Culture

Students taking Stat 20 come from a wide range of backgrounds. We hope to foster an inclusive and supportive learning environment based on curiosity rather than competition. All members of the course community—the instructor, students, tutors, and readers—are expected to treat each other with courtesy and respect.

You will be interacting with course staff and fellow students in several different environments: in class, over the discussion forum, and in office hours. Some of these will be in person, some of them will be online, but the same expectations hold: be kind, be respectful, be professional.

If you are concerned about classroom environment issues created by other students or course staff, please come talk to us about it.

Mode of Instruction

This course is structured as a flipped class, meaning that you’ll first be encountering new concepts in statistics and data science outside of class. Class time is dedicated expanding on the work you’ve done outside of class by working through questions solo, in groups, and as a class.

The reason that this course is structured in this manner is that applied data science is a complex field that unites scientific thinking, computing, mathematics, and an understanding of the context of the data. We will be able to tackle more interesting and challenging questions if we make room during class time to work on them while we’re all in the same place.

Before class

It is your responsibility to become familiar with the topics that appear in the course notes and to work through the reading questions by 11:59 pm on the night before Wednesday’s and Friday’s class sessions. You’re encouraged to experiment to find the method that works best for you: downloading the notes as a pdf and making notes on them, asking and answering questions over the class forum, reading through the website and discussing with friends, etc.

During class on Wednesday and Friday

Class time (2 hrs) will be spent on a range of activities, but the most common will be concept questions (using Poll Everywhere) and working through components of your Problem Sets and Labs. Therefore, the most efficient way to complete your assignments is to be an active participant in class. Attendance is expected on Wednesday and Friday.

During class on Monday

Monday is a shorter (1 hr) and less-structured class dedicated to finishing up work on your assignments.

Office Hours

Tutors and the instructor will offer office hours each week across a range of times. We ask that you visit the office hours of the instructor of your lab section but you’re welcome to visit the office hours of any tutor, not only the one who works in your section. We may adjust the office hour schedule throughout the semester as we understand student needs and preferences.

Please come to office hours! Coming to office hours does not send a signal that you are behind or need extra help. On the contrary, coming to office hours early and often tends to co-occur with success in the course.

Feel free to come to office hours even if you don’t have a specific question about an assignment. The course staff is happy to chat about the course material, statistics in general, careers in statistics, or whatever other statistics or data science topics are on your mind!

COVID policy

Maintaining your health and that of the Berkeley community is of primary importance to course staff, so if you are feeling ill or have been exposed to illness, please do not come to class. All of the materials used in class will be posted to the course website. You’re encouraged to reach out to fellow students to discuss the class materials or stop by office hours to chat with a tutor or the instructor.



The primary materials for the course are the lecture notes, which will be posted to the course website in advance of class. The following textbooks are useful supplementary texts but there is no need to purchase them:


The software that we’ll be using for our data analysis is the free and open-source language called R that we’ll be interacting with via software called RStudio. As a Berkeley student, you have your own version of RStudio waiting you for at: Most students taking Stat 20 have no experience programming; we’ll teach you everything you need to know!

Course communication

Discussion forum

Out-of-class communication for this course will be held on Ed. This forum is a community space to ask and answer questions with your fellow students and course staff. It’s an indispensable resource for staying up-to-date with the course and learning from your peers. It’s also the primary method of communicating with staff. To ask a question of staff, create a new post and mark it as “private” and it will go only to course staff.

In a course this large, the instructors have a difficult time responding to individual emails, so please use the class forum or visit office hours.

Course website

All of the assignments will be posted to the course website at This also holds the course notes, the syllabus, and links to Gradescope, Ed, and RStudio.

Assignments, Exams, and Grading

Turning-in assignments

You will be turning in your assignments on a platform called Gradescope. This is also the platform where your assignments will be graded, so you can return there to get feedback on your work. You are also welcome to file a regrade request if you notice that we made an error in applying the rubric to your work, but be sure to do so within a week of the grades being posted.


Labs are long-form assignments designed to apply the concepts from the lecture notes in the cause of doing an analysis of real data. This will involve both writing code and communicating your thoughts and findings in English. We’ll be working through the most challenging problems from the labs in class, but you may have to complete them on your own outside of class time. Most will be written-up and submitted individually but some will be group submissions.

Labs are to be submitted as PDF files. These PDFs will be generated by rendering Quarto Documents (.qmd files) to HTML and then exporting the HTML into a PDF. Don’t worry if you’re not familiar with the Quarto Document as we will teach you about it!

We will be assessing most of the questions on the lab for correctness but for others we will be giving credit based on completion.

Problem Sets

During class, we will give you a second engagement with the day’s material in the form of a worksheet. These worksheets will run like traditional homework problems and drill the concepts in the reading notes rather than asking you to apply the concepts with a data set, like the lab does.

The problem sets will be procured completely from these in-class worksheets and released in one or two week intervals at the end of the week.

Reading Questions

Reading questions serve to check your understanding and engagement while going through the lecture notes prior to class. There will be a handful of questions per “lecture” and can be a mix of multiple choice, short answer, and coding questions. You can answer them directly on Gradescope. Note that the reading questions will be due 11:59 pm the night before class, that is, on 11:59 pm Tuesdays for the Wednesday lecture, and 11:59 pm Thursdays for the Friday lecture.


Quizzes reinforce the most important concepts from the lecture notes and provide you the opportunity to work through misunderstandings of concepts with peers and the instructor.

There is both an individual and group component to the quiz.

The individual component will last ~25 minutes. You are allowed one, A4, two-sided sheet one-sided handwritten sheet of notes. The group component will take place immediately after the individual component has been completed and will last ~15 minutes. Your final (composite) quiz grade will be the average of your group and individual quiz scores.


The final exam will be held in person during finals week. The Stat 20 Final is scheduled to be held in Exam Group 14: Thursday 05/11/23 in person from 11:30am-2:30pm.


Your final grade in the course will be computed based on:

  • Labs 50%
  • Reading Questions: 5%
  • Quizzes 15%
  • Problem sets 10%
  • Final 20%

The goal of your course staff is to help you develop and demonstrate your mastery of the material. As such, the course will not be curved and standard cutoffs will be used to determine letter grades (> 90% is some kind of A, etc.). In order to provide flexibility around emergencies that might arise for you throughout the semester (for example, missing a quiz due to COVID), we will apply for everyone one emergency drop for quizzes and two emergency drops for reading questions. This means that we will drop your lowest quiz score (which would be a 0 if you were absent) before computing your quiz average. For reading questions, we will drop your two lowest.


Accomodations for students with disabilities

Stat 20 is a course that is designed to allow all students to succeed. If you have letters of accommodations from the Disabled Students’ Program, please share them with your instructor as soon as possible, and we will work out the necessary arrangements.

Late Work

Unfortunately, with a class of this size, we are unable to keep track of and grade submissions of labs, reading question, and problem sets that come in very late. If you narrowly miss the standard deadline, you will still be able to submit within an hour for a small penalty (5% reduction in score). If you don’t submit within an hour you can still submit within a day for a larger penalty (30% reduction).

Collaboration policy

You are encouraged to collaborate with your fellow students on problem sets and labs, but the work you turn in should reflect your own understanding and all of your collaborators must be cited. The individual component of quizzes, reading questions, and exams must reflect only your work.

Researchers don’t use one another’s research without permission; scholars and students always use proper citations in papers; professors may not circulate or publish student papers without the writer’s permission; and students may not circulate or post non-public materials (quizzes, exams, rubrics-any private class materials) from their class without the written permission of the instructor.

The general rule: you must not submit assignments that reflect the work of others unless they are a cited collaborator.

The following examples of collaboration are allowed and in fact encouraged!

  • Discussing how to solve a problem with a classmate.
  • Showing your code to a classmate along with an error message or confusing output.
  • Posting snippets of your code to the discussion forum when seeking help.
  • Helping other students solve questions on the discussion with conceptual pointers or snippets of code that doesn’t whole hog give away the answer.
  • Googling the text of an error message.
  • Copying small snippets of code from answers on Stack Overflow.

The following examples are not allowed:

  • Leaving a representation of your assignment (the text, a screenshot) where students (current and future) can access it. Examples of this include websites like course hero, on a group text chain, over discord/slack, or in a file passed on to future students.
  • Accessing and submitting solutions to assignments from other students distributed as above. This includes copying written answers from other students and slightly modifying the language to differentiate it.
  • Googling for complete problem solutions.
  • Working on the reading questions or individual quizzes in collaboration with other people or resources. These assignments must reflect individual work.
  • Submitting work on an exam that reflects consultation with outside resources or other people. Exams must reflect individual work.

If you have questions about the boundaries of the policy, please ask. We’re always happy to clarify.

Violations of the collaboration policy

The integrity of our course depends on our ability to ensure that students do not violate the collaboration policy. We take this responsibility seriously and forward cases of academic misconduct to the Center for Student Conduct.

Students determined to have violated the academic misconduct policy by the Center for Student Conduct will receive a grade penalty in the course and a sanction from the university which is generally: (i) First violation: Non-Reportable Warning and educational intervention, (ii) Second violation: Suspension/Disciplinary Probation and educational interventions, (iii) Third violation: Dismissal.

And again, if you have questions about the boundaries of the collaboration policy, please ask!

Frequently Asked Questions

  1. What should I do if I’m on the waitlist?

    Attend lecture and section (remember, we are teaching it as one two hour class), and submit all assignments on time. After 3 class meetings, we will take a look at the waitlist.

  2. Are class sessions recorded?

    No. Class sessions feature a mix of group problem solving, activities, and discussion and don’t lend themselves to recording. The course notes are the main reference source for the course. Any materials used during the class session will be posted to the course website.

  3. Is attendance required?

    No, but it is difficult to succeed in this course if you are not regularly attending class. Class sessions are designed to be an effective and efficient format to make progress on important assignments. Plus, it’s a great way to meet your fellow students and learn from one another!

  4. What if I join the class late?

    If you join the class within the first two weeks, read this course syllabus very carefully, read the lecture notes, take a look at gradescope to get a sense of any assignments that may have already passed, and visit office hours to check that you’re up to date with things. The first two weeks of material are very important so you will be able to make up assignments.

    After two weeks into the semester, you’ll have too much material that you’ll need to make up, so you will have to wait to a subsequent semester to take Stat 20.