# Taxonomy of Data

STAT 20: Introduction to Probability and Statistics

## Agenda

• Announcements
• Concept Questions: Conceptual
• Problem Set 1: Taxonomy of Data
• Break
• Concept Questions: Coding
• Lab 0 work time

## Announcements

• Problem Set 1 and Lab 0 are both due Tuesday, January 23rd at 9am on Gradescope
• RQ: Summarizing Categorical Data due Monday, January 22nd at 11:59pm on Gradescope
• Please read the Ed expectations post here before posting on Ed!

# Concept Questions: Conceptual

## Concept Question 1 - Quick Refresher

• Head to PollEverywhere for a quick set of questions regarding Taxonomy of Data!
01:00

# Concept Question 2

There’s no escape from the bird…

## Images as data

• Images are composed of pixels (this image is 1520 by 1012)

• The color in each pixel is in RGB

• Each band takes a value from 0-255

• This image is data with 1520 x 1012 x 3 values.

## Grayscale

• Grayscale images have only one band
• 0 is black, 255 is white
• This image is data with 1520 x 1012 x 1 values.

## Grayscale

• To simplify, assume our photos are 8 x 8 grayscale images.

## Images in a Data Frame

Consider the following images which are our data:

• Let’s simplify them to 8 x 8 grayscale images

## Images in a Data Frame

If you were to put the data from these (8 x 8 grayscale) images into a data frame, what would the dimensions of that data frame be in rows x columns? Answer at pollev.com.

01:00

# Concept Question 3

## A note on variables

There are three things that “variable” could be referring to:

1. a phenomenon
2. how the phenomenon is being recorded or measured into data
• what values can it take? (this is often an intent- or value-laden exercise!)
• for numerical units, what unit should we express it in?
3. How the recorded data is being analyzed
• might you bin/discretizing income data? what are the consequences of this?
• For the following question, you may work under the second definition.

## What type of variable is age?

For each of the following scenarios where age could be a variable, choose the most appropriate taxonomy according to the Taxonomy of Data.

1. Ages of television audiences/demographics
2. Ages of UC Berkeley students
3. The weight of a rock

Answer at pollev.com.

01:00

# Problem Set 1: Taxonomy of Data

20:00

# Break

05:00

# Concept Questions: Coding

• Time to make a series of educated guesses. Close your laptops!

## Educated Guess 1

What will happen here?

Answer at pollev.com/<name>

1 + "one"
01:00

## Educated Guess 2

What will happen here?

Answer at pollev.com/<name>

a <- c(1, 2, 3, 4)
sqrt(log(a))
01:00

## Educated Guess 3

What will happen here?

Answer at pollev.com/<name>

a <- 1 + 2
a + 1
01:00

## Educated Guess 4

What will happen here?

Answer at pollev.com/<name>

a <- c(1, 3.14, "seven")
class(a)
01:00

# Time to work on Lab 0

25:00