Stat 20: Introduction to Probability and Statistics


Andrew Bray

Iain Carmichael

Silas Gifford

Jeremy Sanchez

Shobhana Murali Stoyanov

Welcome to the Age of Data, where claims made using data are all around us: in the news, in the pages of scientific journals, in the policies of government, and in the board rooms of companies across the world. In this course you will explore the forms of claims that are made using data. Some of these are subtle claims about the structure of the data at hand. Others are grand claims about scientific truths or predictions of what will happen in the future. This course will train your ability to critique and construct such arguments made using data.

Course Culture

Students taking Stat 20 come from a wide range of backgrounds. We hope to foster an inclusive and supportive learning environment based on curiosity rather than competition. All members of the course community—the instructor, students, tutors, and readers—are expected to treat each other with courtesy and respect.

You will be interacting with course staff and fellow students in several different environments: in class, over the discussion forum, and in office hours. Some of these will be in person, some of them will be online, but the same expectations hold: be kind, be respectful, be professional.

If you are concerned about classroom environment issues created by other students or course staff, please come talk to us about it.

Mode of Instruction

This course is structured as a flipped class, meaning that you’ll first be encountering new concepts in statistics and data science outside of class. Class time is dedicated to expanding on the work you’ve done outside of class by working through questions solo, in groups, and as a class.

The reason that this course is structured in this manner is that applied data science is a complex field that unites scientific thinking, computing, mathematics, and an understanding of the context of the data. We will be able to tackle more interesting and challenging questions if we make room during class time to work on them while we’re all in the same place.

Before class

It is your responsibility to become familiar with the topics that appear in the course notes and to work through the reading questions on Gradescope by 11:59 pm Monday (for Tues/Weds class) and 11:59 pm Wednesday (for Thurs/Fri class). You’re encouraged to experiment to find the method that works best for you: downloading the notes as a pdf and making notes on them, asking and answering questions over the class forum, etc.

During class on Tue/Wed and Thu/Fri

Class time (2 hrs) will be spent on a range of activities, but the most common will be concept questions (using Poll Everywhere) and working through components of your Problem Sets and Labs. Therefore, the most efficient way to complete your assignments is to be an active participant in class. Attendance is expected on these class days.

During class on Monday

Monday is shorter (1 hr) class that alternates between:

  • Workshops: less-structured class dedicated to finishing up work on your assignments
  • Quizzes: see information on quizzes below.

Plenary lectures

Several times during the semester, the course will be hosting a special guest to give a lecture to all sections of Stat 20. These speakers are preeminent scientists and data scientists from academia or industry who will speak about how the ideas and tools from Stat 20 are used in and impact the real world. These plenary lectures will be held outside of class time in the evening and the schedule will be announced during the semester.

As a part of the graded assignments for the class you will be asked to attend three lectures outside the class and hand in a small written assignment about each. These lecture could be the Stat 20 plenary lectures or they could be from another event; the only restriction is that they have to be related to data science.

Group tutoring

Tutors will offer group tutoring sessions several times each week. This is an opportunity to finish up any assignments that you’ve started in class or review any topics that are muddy. Each group tutoring session will be staffed by 2-4 tutors. You’re welcome to attend any session that works well for your schedule.

Group tutoring is a great place to go to meet other students and collaborate on assignments with tutors on hand to help you get unstuck.

Instructor Office Hours

The instructors will offer office hours each week across a range of times. We ask that you only visit the office hours of your instructor, but you are welcome to visit the tutoring sessions of any tutors, not just the ones who work in your section. We may adjust the office hour and group tutoring sessions schedule throughout the semester as we understand student needs and preferences. Please check the office hours tab on the syllabus page to see the times of the various office hour/group tutoring sessions.

Office hours are an opportunity to chat one-on-one with your instructor. Please come to office hours! Coming to office hours does not send a signal that you are behind or need extra help. On the contrary, coming to office hours early and often tends to co-occur with success in the course. Instructors are happy to chat about the course material, statistics in general, careers in statistics, and whatever other statistics or data science topics are on your mind!

COVID policy

Maintaining your health and that of the Berkeley community is of primary importance to course staff, so if you are feeling ill or have been exposed to illness, please do not come to class. All of the materials used in class will be posted to the course website. You’re encouraged to reach out to fellow students to discuss the class materials or stop by group tutoring or office hours to chat with a tutor or the instructor.



The primary materials for the course are the lecture notes, which will be posted to the course website in advance of class. The following textbooks are useful supplementary texts but there is no need to purchase them:


The software that we’ll be using for our data analysis is the free and open-source language called R that we’ll be interacting with via software called RStudio. As a Berkeley student, you have your own version of RStudio waiting you for at: Most students taking Stat 20 have no experience programming; we’ll teach you everything you need to know!

Course communication

Discussion forum

Out-of-class communication for this course will be held on Ed. This forum is a community space to ask and answer questions with your fellow students and course staff. It’s an indispensable resource for staying up-to-date with the course and learning from your peers. It’s also the primary method for communicating with staff. To ask a question for staff, create a new post and mark it as “private” and it will go only to course staff. If your question does not include personal information and can be answered by other students, please make your question public.

In a course this large, the instructors have a difficult time responding to individual emails, so please use the class forum or visit office hours.

Course website

All of the assignments will be posted to the course website at This also holds the course notes, the syllabus, and links to Gradescope, Ed, and RStudio. Note that we will not be using bcourses except perhaps to occasionally post videos.

Assignments, Exams, and Grading

Turning-in assignments

You will be turning in your assignments on a platform called Gradescope. This is also the platform where your assignments will be graded, so you can return there to get feedback on your work. You are welcome to file a regrade request if you notice that we made an error in applying the rubric to your work, but be sure to do so within a week of the grades being posted. We will not accept regrade requests past that point.


Labs are long-form assignments designed to apply the concepts from the lecture notes in the cause of doing an analysis of real data. This will involve both writing code and communicating your thoughts and findings in English. We’ll be working through the most challenging problems from the labs in class, but you may have to complete them on your own outside of class time. Most labs will be written-up and submitted individually but some will be group submissions.

Labs are to be submitted as PDF files. These PDFs will be generated by rendering Quarto Documents (.qmd files) to HTML and then exporting the HTML into a PDF. Don’t worry if you’re not familiar with the Quarto Document as we will teach you about it!

We will be assessing most questions on labs for correctness but for others we will be giving credit based on completion.

Problem Sets

During class, we will give you a second engagement with the day’s material in the form of a worksheet. These worksheets will run like traditional homework problems and drill the concepts in the reading notes rather than asking you to apply the concepts with a data set, like the lab does.

The problem sets will be procured completely from these in-class worksheets and released in one or two week intervals.

Reading Questions

Reading questions serve to check your understanding and engagement while going through the lecture notes prior to class. There will be a handful of questions per lecture note. These questions be a mix of multiple choice, short answer, and coding questions. You can answer them directly on Gradescope. Note that the reading questions will be due 11:59 pm on Monday (for the Tuesday/Wednesday lectures) and Wednesday (for the Thursday/Friday lectures).

Plenary Lecture Reviews

An important way to engage with new ideas and people at a University is to attend guest lectures. Stat 20 will host 3-5 such lectures over the course of semester and will also provide a list of guest lectures in other departments that focus on topics related to probability, statistics, and data science.

A plenary lecture review is a written assignment that details the background of the speaker, the question they address, the tools, data, and methods they use, and their key findings. You will be asked to submit three such lecture reviews over the course of the semester based on the lectures that you have chosen to attend. These lecture reviews will be due at the end of the semester.


Quizzes reinforce the most important concepts from the lecture notes and provide you the opportunity to work through misunderstandings of concepts with peers and the instructor.

There is both an individual and group component to the quiz.

The individual component will last ~25 minutes. You are allowed one, A4, one-sided handwritten sheet of notes. The group component will take place immediately after the individual component has been completed and will last ~15 minutes. Your final (composite) quiz grade will be the average of your group and individual quiz scores.


The final exam will be held in person during finals week. The current time and date is Thursday, December 14 from 8am to 11am, not what you see on CalCentral.


Your final grade in the course will be computed based on:

  • Labs 35%
  • Plenary Lecture Reviews: 3%
  • Reading Questions: 3%
  • Quizzes 25%
  • Problem sets 7%
  • Final 27%

The goal of this course is to help you develop and demonstrate your mastery of the material. As such, the course will not be curved and standard cutoffs will be used to determine letter grades (> 90% is some kind of A, etc.).

In order to provide flexibility around emergencies that might arise for you throughout the semester (for example, missing a quiz due to COVID), we will apply for everyone:

  • one emergency drop for quizzes
  • two emergency drops for reading questions.

This means that we will drop your lowest quiz score (which would be a 0 if you were absent) before computing your quiz average. For reading questions, we will drop your two lowest.


Accomodations for students with disabilities

Stat 20 is a course that is designed to allow all students to succeed. If you have letters of accommodations from the Disabled Students’ Program, please share them with your instructor as soon as possible, and we will work out the necessary arrangements.

Late Work

Unfortunately, with a class of this size, we are unable to keep track of and grade submissions of labs, reading questions, and problem sets that come in very late. If you narrowly miss the standard deadline, you will still be able to submit within an hour for a small penalty (5% reduction in score). If you don’t submit within an hour you can still submit within a day for a larger penalty (30% reduction).

Collaboration policy

You are encouraged to collaborate with your fellow students on problem sets and labs, but the work you turn in should reflect your own understanding and all of your collaborators must be cited. The individual component of quizzes, reading questions, and exams must reflect only your work.

Researchers don’t use one another’s research without permission; scholars and students always use proper citations in papers; professors may not circulate or publish student papers without the writer’s permission; and students may not circulate or post non-public materials (quizzes, exams, rubrics-any private class materials) from their class without the written permission of the instructor.

The general rule: you must not submit assignments that reflect the work of others unless they are a cited collaborator.

The following examples of collaboration are allowed and in fact encouraged!

  • Discussing how to solve a problem with a classmate.
  • Showing your code to a classmate along with an error message or confusing output.
  • Posting snippets of your code to the discussion forum when seeking help.
  • Helping other students solve questions on the discussion with conceptual pointers or snippets of code that doesn’t whole hog give away the answer.
  • Googling the text of an error message.
  • Copying small snippets of code from answers on Stack Overflow.

The following examples are not allowed:

  • Leaving a representation of your assignment (the text, a screenshot) where students (current and future) can access it. Examples of this include websites like course hero, on a group text chain, over discord/slack, or in a file passed on to future students.
  • Accessing and submitting solutions to assignments from other students distributed as above. This includes copying written answers from other students and slightly modifying the language to differentiate it.
  • Googling for complete problem solutions.
  • Working on the reading questions or individual quizzes in collaboration with other people or resources. These assignments must reflect individual work.
  • Submitting work on an exam that reflects consultation with outside resources or other people. Exams must reflect individual work.

If you have questions about the boundaries of the policy, please ask. We’re always happy to clarify.

Violations of the collaboration policy

The integrity of our course depends on our ability to ensure that students do not violate the collaboration policy. We take this responsibility seriously and forward cases of academic misconduct to the Center for Student Conduct.

Students determined to have violated the academic misconduct policy by the Center for Student Conduct will receive a grade penalty in the course and a sanction from the university which is generally: (i) First violation: Non-Reportable Warning and educational intervention, (ii) Second violation: Suspension/Disciplinary Probation and educational interventions, (iii) Third violation: Dismissal.

And again, if you have questions about the boundaries of the collaboration policy, please ask!

Frequently Asked Questions

  1. What should I do if I’m on the waitlist?

    Attend both lecture and section (remember, we are teaching it as one two hour class), and submit all assignments on time. Visit your instructor on the first day of class so you can be added to the course Ed and Gradescope.

  2. Are class sessions recorded?

    No. Class sessions feature a mix of group problem solving, activities, and discussion and don’t lend themselves to recording. The course notes are the main reference source for the course. Any materials used during the class session will be posted to the course website.

  3. Is attendance required?

    No, but it is difficult to succeed in this course if you are not regularly attending class. Class sessions are designed to be an effective and efficient format to make progress on important assignments. Plus, it’s a great way to meet your fellow students and learn from one another!

  4. What if I join the class late?

    If you join the class within the first two weeks, read this course syllabus very carefully, read the lecture notes, take a look at Gradescope to get a sense of any assignments that may have already passed, and visit office hours to check that you’re up to date with things. The first two weeks of material are very important so you must be able to make up assignments.

    After two weeks into the semester, you’ll have too much material that you’ll need to make up, so you will have to wait to a subsequent semester to take Stat 20.