Lab 4: People’s Park

Slides

In answering the following questions, it will be helpful to consult

  1. the email from Chancellor Christ,
  2. slide describing the background and methodology, and
  3. the original questionnaire.

Part I: Understanding the Context of the Data

Part II: Computing on the Data

The data collected by the Chancellor’s Office on Cal students can be found as ppk in the stat20data package.

The ppk data set represents a subset of questions that were asked in the questionnaire and have had random noise added to them. The results, in aggregate, share similar statistical properties to the raw data, but a given row no longer reflects an individual student’s response completely.

Question 1

Print the first few rows with the columns that correspond to the responses to survey questions 1, 7, and 8 (Note: we have changed the data back from all numerical data, as suggested by part 1 question 8, to a mix of numerical and categorical data). Please comment on whether your encoding of the data from Q7 on the questionnaire matches the encoding in ppk.

Question 2

Return to your sketches from lab part 1, question 9. Create those visualizations (or more appropriate analogues) using the questionnaire data. For each, add a title and axis labels to make it clear what they are showing, and describe the distribution in words. If your visualization is of ordinal data, the bars should be ordered accordingly. For part a here, you’re welcome to select just one of the priorities to visualize.

  1. Question 9
  2. Question 10
  3. Question 18 and 21 (showing the change from before and after the information in one plot)

Question 3

Create a new column called support_before that takes the response data from question 18 and returns TRUE for answers of “Very strongly support”, “Strongly support”, and “Somewhat support” and FALSE otherwise. What proportion of the survey participants in each class (freshman, sophomore, etc) supported the People’s Park Project before being presented with the information on the bottom of page 14?

Question 4

Repeat the previous question but this time drop any rows with NA (missing data) before starting your calculation. This can be done by using the drop_na() function, which scans any columns that you provide and drop any rows with NA in that column.

ppk |>
  drop_na(Q18_words)

Does the proportion change? If it does, which proportion do you think most consumers of this data analysis will assume they’re looking at?

For the remaining questions in this lab, drop rows with NAs in the columns that you’re using for that question.

Question 5

What is the mean and median rating of the condition of People’s Park (question 15 on the survey)?

Question 6

Create a new column called change_in_support that measures the change in support from question 18 to 21 of the survey. What is the mean change in support of the survey participants in each class (freshman, sophomore, etc) for the People’s Park Project after reading the information? What assumption must you make about the values of the Likert scale in order for these statistics to be informative?

Question 7

Construct one addition visualization that captures a variable or relationship between two variables that you are interested in. Describe the structure that you see in the plot.

Question 8

Create a 95% confidence intervals for the mean rating of the condition of People’s Park using the normal curve. Interpret the interval in the context of the problem in a clear sentence.

Question 9

Create a 95% confidence interval using the normal curve for the overall proportion of students who support the People’s Park Project without having been exposed to the information on page 14. Interpret the interval in the context of the problem in a clear sentence. Does your point estimate approximately match that reported in the Chancellor’s email?

Question 10

Select a confidence level lower than 95% and recalculate the interval from the previous question to express this confidence level. In general, are intervals with higher confidence wider or narrower than those with lower confidence?

Question 11

Using the normal curve, create a 95% confidence interval for the mean change in support for the Project across the entire population after being exposed to the information on page 14.

Question 12

Does your interval from the previous question contain 0? What are the implications of that for those working in the Chancellor’s Office on the People’s Park Project?