📊 Phase IV: The Analyst
Phase IV: The Analyst
Duration: Weeks 13-15 (Apr 06 - Apr 20) | Points: 225
Analyze data using R for statistical insights
Learning Objectives
By the end of Phase IV, you will:
- ✅ Import and clean data in R
- ✅ Create descriptive statistics and visualizations
- ✅ Document all data transformation steps
- ✅ Perform preliminary analysis
- ✅ Generate visualizations and summary tables
- ✅ Communicate findings and limitations
What You’ll Do
Phase 4 is where you execute your plan:
Collect - Gather data using your approved methods (surveys, interviews, scraping, etc.)
Clean - Verify quality, handle missing values, fix formatting
Analyze - Apply statistical or qualitative methods to answer your questions
Visualize - Create tables, charts, and figures that show your findings
Document - Track every decision you make so others (and you, later!) understand what you did
Phase 4 Content & Activities
Activity 1: Data Collection (Week 8)
Follow your approved data collection plan:
- Collect data - Surveys, interviews, archival data, API scraping, etc.
- Track what’s happening - Keep notes in
00_Inbox/of collection date, sample size, issues - Store raw data - Save exactly as received in
03_Project/03_Data/raw/ - Document sources - Who, when, where did this data come from?
Organize your data folder:
03_Project/03_Data/
├── raw/ # Exactly as received (don't modify!)
├── clean/ # After cleaning + transformation
├── README.md # Overview of datasets + versions
└── data_log.txt # Timeline of what was collected when
Activity 2: Data Cleaning (Week 9)
Transform raw data into analysis-ready data:
- Check quality - Are there missing values? Duplicates? Odd formatting?
- Fix issues - Use R, Python, Excel, or whatever tool fits your data
- Document every step - Save cleaning script showing exactly what you changed
- Create a log - List decisions: “Removed 3 duplicate responses”, “Standardized date format”, etc.
Create: 03_Project/03_Data/cleaning_script.[R|py|xlsx] + cleaning_log.md
Cleaning Log Example:
## Data Cleaning Log
**Raw Data:** survey_responses_raw.csv (127 rows)
### Steps Taken:
1. Removed 2 duplicate response IDs (#45, #103)
2. Converted date_submitted from text to date format
3. Filled missing age_group values (n=5) using median imputation
4. Standardized text responses to lowercase
5. Removed 1 response with >50% missing values
**Result:** clean_data.csv (124 rows, 0 missing values)
**Date:** 2024-01-15 | **Tool:** R 4.3.2 | **Done by:** [Your Name]Activity 3: Preliminary Analysis (Week 9-10)
Apply your research methods:
- Descriptive stats - How many? What’s the range? What’s typical?
- Inferential stats (if applicable) - Test your hypotheses
- Qualitative coding (if applicable) - Find themes and patterns
- Create visualizations - 2-3 core figures/tables that directly answer your research questions
Save: Analysis scripts in 03_Project/04_Drafts/
Tools & Approaches
Data Analysis Tools
R & RStudio
Statistical computing, reproducible scripts
Python
Data wrangling with pandas, visualization with matplotlib
Excel/Sheets
Quick descriptive stats, pivot tables, charts
Git & GitHub
Version control for data + scripts
Analytical Methods
Quantitative: - Descriptive statistics (mean, median, range, distribution) - Hypothesis testing (t-tests, chi-square, ANOVA) - Correlation & regression - Visualization (histograms, scatter plots, bar charts)
Qualitative: - Thematic coding - Pattern analysis - Narrative summaries - Qualitative comparison matrices
Resources for Phase 4
📊 Data Cleaning Checklist
15 min
Step-by-step checklist for preparing data for analysis.
[Content to be added]
📈 Data Visualization Best Practices
Intermediate
Guidelines for creating clear, honest visualizations that support your story.
[Content to be added]
🔧 Analysis Scripts Template
Intermediate
Sample R and Python scripts for common analyses (descriptive stats, plotting, testing).
[Content to be added]
What Success Looks Like
By the end of Phase 4, you should have:
✅ Raw Data - Stored unchanged in 03_Project/03_Data/raw/
✅ Clean Data - Documented cleaning process
✅ Cleaning Log - Every step explained
✅ Analysis Scripts - Reproducible code showing your work
✅ 2-3 Key Visualizations - Tables/figures directly answering your questions
✅ Data Memo - Summary of findings, limitations, next steps
✅ Week 8-10 Journal Entries - Reflections on your analysis journey
✅ Ready for Phase 5 - Findings ready to present
Milestone Timeline
| Week | Activity | Deliverable |
|---|---|---|
| Week 8 | Data collection | Raw data file + collection log |
| Week 9 | Data cleaning | Clean data + cleaning script/log |
| Week 10 | Analysis & memo | 2-3 visualizations + analysis memo |
Common Data Issues & Solutions
| Problem | Solution |
|---|---|
| Missing values | Document why they’re missing, decide: remove, impute, or keep |
| Duplicates | Identify (check IDs), document, remove if not intentional |
| Formatting inconsistencies | Standardize dates, text case, number format |
| Outliers | Document, investigate cause, decide whether to keep/remove |
| Encoding issues | Check file format (UTF-8), re-import if needed |
Next Phase
Once you complete Phase 4:
→ Phase V: The Publisher - Create your web portfolio and final report
Questions & Support
- Tool help? Check the software setup guides in Resources
- Statistical question? Visit office hours or check analysis templates
- Stuck on something? Email Dr. Leith with your memo draft