Taxonomy of Data

STAT 20: Introduction to Probability and Statistics

Agenda

  • Concept Questions: Taxonomy of Data
  • Reading Questions
  • Worksheet on Paper
  • Break
  • Worksheet via RStudio

Concept Questions

Types of Variables

There are three things a variable could be referring to

  1. a phenomenon
  2. how the phenomenon is being recorded or measured into data
    • what values can it take? (this is often an intent- or value-laden exercise!)
    • for numerical units, what unit should we express it in?
  3. How the recorded data is being analyzed
    • binning/discretizing income data
    • if a barchart has too many bars, using a histogram.



What type of variable is age?


Answer at pollev.com/<name>

01:00

Images as data

  • Images are composed of pixels (this image is 1012 by 1520)

  • The color in each pixel is in RGB

  • Each band takes a value from 0-255

  • This image is data with 1020 x 1520 x 3 values.

A shoebill with a duck in its mouth.

Grayscale

  • Grayscale images have only one band
  • 0 is black, 255 is white
  • This image is data with 1020 x 1520 x 1 values.

To simplify, assume our photos are 8 x 8 grayscale images.

A shoebill with a duck in its mouth in grayscale.

Images in a Data Frame

If you were to put the data from these (8 x 8 grayscale) images into a data frame, what would the dimensions of that data frame be in rows x columns?

01:00

Reading Questions

Worksheet on Paper

20:00

Worksheet via RStudio

Demo

Your turn

20:00