Phase IV: The Analyst
Phase IV: The Analyst
Duration: Weeks 11–13 | Points: 225 Analyze data using R for statistical insights
Learning Objectives
By the end of Phase IV, you will:
- Import, clean, and transform data in R using the tidyverse
- Create frequency tables and publication-quality visualizations with ggplot2
- Select and conduct the appropriate inferential statistical test for your variables
- Interpret results in both APA format and plain English
- Calculate and contextualize effect sizes
- Document all data transformation and analysis steps
Phase IV Overview
This phase is where your research becomes quantitative. Weeks 10–13 each get two full class sessions dedicated to a single R/analysis chapter, giving you time to practice in class. You apply data wrangling, visualization, and statistical testing to the unified_music dataset from coursepackR. These three assignments build on each other sequentially.
Weekly Breakdown
| Week | Reading | Focus | Assignment Due |
|---|---|---|---|
| 11 | Ch 18: Seeing Patterns | ggplot2 grammar of graphics; publication-quality visualization | Data Wrangling [R] (50 pts) |
| 12 | Ch 19: The Surprise Detector | Hypothesis testing; chi-square, correlation, regression; APA reporting | Describing Data [R] (75 pts) |
| 13 | Ch 20: Interpreting the Call | Effect sizes; power; honest discussion writing | Inferencing Data [R] (100 pts) |
Assignments
Data Wrangling [R] (50 pts) — Week 13
Using the unified_music dataset from coursepackR:
- Hand-code a variable for a 30-song sample
- Import the raw dataset and clean it (column names, factors, missing values)
- Join your coded data back to the main dataset
- Export a clean dataset as
coding_data_clean.RDS
This assignment teaches the full data pipeline: raw input to analysis-ready output.
Describing Data [R] (75 pts) — Week 14
Using coding_data_clean.RDS:
- Create frequency tables for key categorical variables
- Build ggplot2 visualizations (bar charts, trend lines) with professional formatting
- Create a cross-tabulation showing the relationship between two variables
- Write 2–3 sentences of narrative interpreting each output
Inferencing Data [R] (100 pts) — Week 15
Using coding_data_clean.RDS:
- State your hypothesis in plain English before running any code
- Select the appropriate test based on your variable types:
- Two categorical variables: Chi-Square test of independence
- One categorical (2 groups) x one continuous: Independent t-test
- Two continuous variables: Pearson correlation
- Run the test in R and report in APA format
- Interpret the result in plain English
- Calculate and report the effect size
R Environment
All R assignments use the unified_music dataset (1,792 songs with Billboard, Spotify, and Genius data). Load it with:
library(coursepackR)
data(unified_music)Required packages: tidyverse, knitr, kableExtra, scales, effsize
Practice Activities
- Pokemon Wrangling Practice — Apply the full Import → Diagnose → Clean → Export pipeline to a TidyTuesday Pokemon dataset. Includes a downloadable
.qmdfile to work through in RStudio.