Taxonomy of Data

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Concept Questions: Conceptual
  • Problem Set 1: Taxonomy of Data
  • Break
  • Concept Questions: Coding
  • Lab 0 work time

Announcements

  • Problem Set 1 and Lab 0 are both due Tuesday, January 23rd at 9am on Gradescope
  • RQ: Summarizing Categorical Data due Monday, January 22nd at 11:59pm on Gradescope
  • Please read the Ed expectations post here before posting on Ed!

Concept Questions: Conceptual

Concept Question 1 - Quick Refresher

  • Head to PollEverywhere for a quick set of questions regarding Taxonomy of Data!
01:00

Concept Question 2

There’s no escape from the bird…

Images as data

  • Images are composed of pixels (this image is 1520 by 1012)

  • The color in each pixel is in RGB

  • Each band takes a value from 0-255

  • This image is data with 1520 x 1012 x 3 values.

A shoebill with a duck in its mouth.

Grayscale

  • Grayscale images have only one band
  • 0 is black, 255 is white
  • This image is data with 1520 x 1012 x 1 values.

A shoebill with a duck in its mouth in grayscale.

Grayscale

  • To simplify, assume our photos are 8 x 8 grayscale images.

An 8 x 8 grayscale image

Images in a Data Frame

Consider the following images which are our data:

  • Let’s simplify them to 8 x 8 grayscale images

Images in a Data Frame

If you were to put the data from these (8 x 8 grayscale) images into a data frame, what would the dimensions of that data frame be in rows x columns? Answer at pollev.com.

01:00

Concept Question 3

A note on variables

There are three things that “variable” could be referring to:

  1. a phenomenon
  2. how the phenomenon is being recorded or measured into data
    • what values can it take? (this is often an intent- or value-laden exercise!)
    • for numerical units, what unit should we express it in?
  3. How the recorded data is being analyzed
    • might you bin/discretizing income data? what are the consequences of this?
  • For the following question, you may work under the second definition.

What type of variable is age?

For each of the following scenarios where age could be a variable, choose the most appropriate taxonomy according to the Taxonomy of Data.

  1. Ages of television audiences/demographics
  2. Ages of UC Berkeley students
  3. The weight of a rock

Answer at pollev.com.

01:00

Problem Set 1: Taxonomy of Data

20:00

Break

05:00

Concept Questions: Coding

  • Time to make a series of educated guesses. Close your laptops!

Educated Guess 1

What will happen here?


Answer at pollev.com/<name>


1 + "one"
01:00

Educated Guess 2

What will happen here?


Answer at pollev.com/<name>


a <- c(1, 2, 3, 4)
sqrt(log(a))
01:00

Educated Guess 3

What will happen here?


Answer at pollev.com/<name>


a <- 1 + 2
a + 1
01:00

Educated Guess 4

What will happen here?


Answer at pollev.com/<name>


a <- c(1, 3.14, "seven")
class(a)
01:00

Time to work on Lab 0

25:00

End of Lecture