Phase IV: The Analyst

Phase IV: The Analyst

Duration: Weeks 11–13 | Points: 225 Analyze data using R for statistical insights


Learning Objectives

By the end of Phase IV, you will:

  • Import, clean, and transform data in R using the tidyverse
  • Create frequency tables and publication-quality visualizations with ggplot2
  • Select and conduct the appropriate inferential statistical test for your variables
  • Interpret results in both APA format and plain English
  • Calculate and contextualize effect sizes
  • Document all data transformation and analysis steps

Phase IV Overview

This phase is where your research becomes quantitative. Weeks 10–13 each get two full class sessions dedicated to a single R/analysis chapter, giving you time to practice in class. You apply data wrangling, visualization, and statistical testing to the unified_music dataset from coursepackR. These three assignments build on each other sequentially.


Weekly Breakdown

Week Reading Focus Assignment Due
11 Ch 18: Seeing Patterns ggplot2 grammar of graphics; publication-quality visualization Data Wrangling [R] (50 pts)
12 Ch 19: The Surprise Detector Hypothesis testing; chi-square, correlation, regression; APA reporting Describing Data [R] (75 pts)
13 Ch 20: Interpreting the Call Effect sizes; power; honest discussion writing Inferencing Data [R] (100 pts)

Assignments

Data Wrangling [R] (50 pts) — Week 13

Using the unified_music dataset from coursepackR:

  1. Hand-code a variable for a 30-song sample
  2. Import the raw dataset and clean it (column names, factors, missing values)
  3. Join your coded data back to the main dataset
  4. Export a clean dataset as coding_data_clean.RDS

This assignment teaches the full data pipeline: raw input to analysis-ready output.


Describing Data [R] (75 pts) — Week 14

Using coding_data_clean.RDS:

  1. Create frequency tables for key categorical variables
  2. Build ggplot2 visualizations (bar charts, trend lines) with professional formatting
  3. Create a cross-tabulation showing the relationship between two variables
  4. Write 2–3 sentences of narrative interpreting each output

Inferencing Data [R] (100 pts) — Week 15

Using coding_data_clean.RDS:

  1. State your hypothesis in plain English before running any code
  2. Select the appropriate test based on your variable types:
    • Two categorical variables: Chi-Square test of independence
    • One categorical (2 groups) x one continuous: Independent t-test
    • Two continuous variables: Pearson correlation
  3. Run the test in R and report in APA format
  4. Interpret the result in plain English
  5. Calculate and report the effect size

R Environment

All R assignments use the unified_music dataset (1,792 songs with Billboard, Spotify, and Genius data). Load it with:

library(coursepackR)
data(unified_music)

Required packages: tidyverse, knitr, kableExtra, scales, effsize


Practice Activities

  • Pokemon Wrangling Practice — Apply the full Import → Diagnose → Clean → Export pipeline to a TidyTuesday Pokemon dataset. Includes a downloadable .qmd file to work through in RStudio.