✅ R Software, https://cloud.r-project.org/
✅ RStudio IDE, https://posit.co/products/open-source/rstudio/?sid=1
✅ Quarto, https://quarto.org/docs/get-started/
✅ Created RStudio Project in your computer
✅ Created quarto document

| Data type | Example | Column header |
|---|---|---|
| logical | TRUE |
lgl |
| integer | 1L |
int |
| double | 1.5 |
dbl |
| character | "A" |
chr |
| factor | factor("A") |
fct |
| ordered | ordered("a") |
ord |
install.packages("tidyverse") # Includes readr for importinginstall.packages("janitor") # For cleaning column namesinstall.packages("lubridate") # For date handlinglibrary(tidyverse)library(janitor)library(lubridate)read_csv()
df <- read_csv("path/to/file.csv")
read_csv() Parameterscol_names: Specify if first row contains column names
skip: Skip rows at the beginning
n_max: Read only first n rows
na: Specify what values should be treated as NA
col_types: Specify column types explicitly
🤯 Problem: Column names with spaces, special characters, or inconsistent cases are hard to work with
Convert character to factor
Convert character to date
Factors are categorical variables with defined levels
They’re useful for:
Export to CSV
Export to Excel
Excel files (requires readxl package)
SPSS, Stata, SAS files (requires haven package)
read_csv() instead of read.csv()
janitor::clean_names()col_types parameter“how to use visualization and transformation to explore your data in a systematic way”
penguinsLive on three island: Biscoe, Dream, & Torgersen.


install.packages("tidyverse")install.packages("palmerpenguins")library(tidyverse)library(palmerpenguins)data(penguins)View(penguins)Exploring Categorical Variables
Key Question: Which species live on which islands?
Outliers are observations that are very different from others They can be:
EDA is about asking and answering questions!
QUESTION: Size difference between sexes?
QUESTION: Do penguins on different islands have different sizes?
QUESTION: Are bill dimensions related to body size?
QUESTION: Has penguin size changed over the years?
