📊 Phase IV: The Analyst

Phase IV: The Analyst

Duration: Weeks 13-15 (Apr 06 - Apr 20) | Points: 225
Analyze data using R for statistical insights


Learning Objectives

By the end of Phase IV, you will:

  • ✅ Import and clean data in R
  • ✅ Create descriptive statistics and visualizations
  • ✅ Document all data transformation steps
  • ✅ Perform preliminary analysis
  • ✅ Generate visualizations and summary tables
  • ✅ Communicate findings and limitations

What You’ll Do

Phase 4 is where you execute your plan:

Collect - Gather data using your approved methods (surveys, interviews, scraping, etc.)
Clean - Verify quality, handle missing values, fix formatting
Analyze - Apply statistical or qualitative methods to answer your questions
Visualize - Create tables, charts, and figures that show your findings
Document - Track every decision you make so others (and you, later!) understand what you did


Phase 4 Content & Activities

Activity 1: Data Collection (Week 8)

Follow your approved data collection plan:

  1. Collect data - Surveys, interviews, archival data, API scraping, etc.
  2. Track what’s happening - Keep notes in 00_Inbox/ of collection date, sample size, issues
  3. Store raw data - Save exactly as received in 03_Project/03_Data/raw/
  4. Document sources - Who, when, where did this data come from?

Organize your data folder:

03_Project/03_Data/
├── raw/              # Exactly as received (don't modify!)
├── clean/            # After cleaning + transformation
├── README.md         # Overview of datasets + versions
└── data_log.txt      # Timeline of what was collected when

Activity 2: Data Cleaning (Week 9)

Transform raw data into analysis-ready data:

  1. Check quality - Are there missing values? Duplicates? Odd formatting?
  2. Fix issues - Use R, Python, Excel, or whatever tool fits your data
  3. Document every step - Save cleaning script showing exactly what you changed
  4. Create a log - List decisions: “Removed 3 duplicate responses”, “Standardized date format”, etc.

Create: 03_Project/03_Data/cleaning_script.[R|py|xlsx] + cleaning_log.md

Cleaning Log Example:

## Data Cleaning Log

**Raw Data:** survey_responses_raw.csv (127 rows)

### Steps Taken:
1. Removed 2 duplicate response IDs (#45, #103)
2. Converted date_submitted from text to date format
3. Filled missing age_group values (n=5) using median imputation
4. Standardized text responses to lowercase
5. Removed 1 response with >50% missing values

**Result:** clean_data.csv (124 rows, 0 missing values)
**Date:** 2024-01-15 | **Tool:** R 4.3.2 | **Done by:** [Your Name]

Activity 3: Preliminary Analysis (Week 9-10)

Apply your research methods:

  1. Descriptive stats - How many? What’s the range? What’s typical?
  2. Inferential stats (if applicable) - Test your hypotheses
  3. Qualitative coding (if applicable) - Find themes and patterns
  4. Create visualizations - 2-3 core figures/tables that directly answer your research questions

Save: Analysis scripts in 03_Project/04_Drafts/

Activity 4: Share First Results (Week 10)

Get feedback before finalizing:

  1. Create a short memo - What you found, what it means, what’s uncertain
  2. Include visuals - Share 2-3 key tables/charts
  3. List limitations - What could be affecting your results?
  4. Share with Dr. Leith - Email or submit for feedback

Template:

# Data Analysis Memo

## Overview
**Research Question:** [Your question]  
**Sample Size:** [N]  
**Analysis Date:** [Date]

## Key Findings
1. [Finding 1 with figure/table reference]
2. [Finding 2 with figure/table reference]
3. [Finding 3 with figure/table reference]

## Interpretation
[What do these findings mean? How do they answer your research question?]

## Limitations
- [Limitation 1]
- [Limitation 2]
- [How might these affect your conclusions?]

## Next Steps
[What still needs clarification or additional analysis?]

Tools & Approaches

Data Analysis Tools

📊

R & RStudio

Statistical computing, reproducible scripts

Setup Guide

🐍

Python

Data wrangling with pandas, visualization with matplotlib

Setup Guide

📈

Excel/Sheets

Quick descriptive stats, pivot tables, charts

Beginner Tips

💾

Git & GitHub

Version control for data + scripts

Setup Guide

Analytical Methods

Quantitative: - Descriptive statistics (mean, median, range, distribution) - Hypothesis testing (t-tests, chi-square, ANOVA) - Correlation & regression - Visualization (histograms, scatter plots, bar charts)

Qualitative: - Thematic coding - Pattern analysis - Narrative summaries - Qualitative comparison matrices


Resources for Phase 4

📊 Data Cleaning Checklist

15 min

Step-by-step checklist for preparing data for analysis.

[Content to be added]

📈 Data Visualization Best Practices

Intermediate

Guidelines for creating clear, honest visualizations that support your story.

[Content to be added]

🔧 Analysis Scripts Template

Intermediate

Sample R and Python scripts for common analyses (descriptive stats, plotting, testing).

[Content to be added]


What Success Looks Like

By the end of Phase 4, you should have:

Raw Data - Stored unchanged in 03_Project/03_Data/raw/
Clean Data - Documented cleaning process
Cleaning Log - Every step explained
Analysis Scripts - Reproducible code showing your work
2-3 Key Visualizations - Tables/figures directly answering your questions
Data Memo - Summary of findings, limitations, next steps
Week 8-10 Journal Entries - Reflections on your analysis journey
Ready for Phase 5 - Findings ready to present


Milestone Timeline

Week Activity Deliverable
Week 8 Data collection Raw data file + collection log
Week 9 Data cleaning Clean data + cleaning script/log
Week 10 Analysis & memo 2-3 visualizations + analysis memo

Common Data Issues & Solutions

Problem Solution
Missing values Document why they’re missing, decide: remove, impute, or keep
Duplicates Identify (check IDs), document, remove if not intentional
Formatting inconsistencies Standardize dates, text case, number format
Outliers Document, investigate cause, decide whether to keep/remove
Encoding issues Check file format (UTF-8), re-import if needed

Next Phase

Once you complete Phase 4:

Phase V: The Publisher - Create your web portfolio and final report


Questions & Support

  • Tool help? Check the software setup guides in Resources
  • Statistical question? Visit office hours or check analysis templates
  • Stuck on something? Email Dr. Leith with your memo draft