� Phase III: The Builder
Phase III: The Builder
Duration: Weeks 5-12 (Feb 09 - Mar 30) | Points: 250
Construct your research framework and measurement tools
Learning Objectives
By the end of Phase III, you will:
- ✅ Apply theory to frame research questions
- ✅ Define variables conceptually and operationally
- ✅ Build a comprehensive codebook
- ✅ Design and pilot test a sampling plan
- ✅ Complete ethics certification (CITI)
- ✅ Test definitions with pilot data
- ✅ Document coding decisions and edge cases
- ✅ Assess inter-coder reliability
- ✅ Iterate on definitions based on pilot testing :::
What You’ll Do
A data dictionary is like a translation guide for your data. It tells anyone (including future you!) exactly what each variable means:
- Variable names - How is this stored in your data?
- Definitions - What does this variable measure?
- Types - Is it text, number, date, category?
- Allowed values - What answers are valid?
- Examples - Real samples of how to code things
- Special rules - Edge cases and how to handle them
Phase 3 Content & Activities
Activity 1: Start Your Data Dictionary (Week 5)
- Copy the Template - Get the Data Dictionary Template from your vault
- List Your Variables - From your research questions, what will you measure?
- Define Each One - Write clear definitions for every variable
- Specify Types - Is each variable numeric, text, categorical, etc.?
Example:
| Variable | Definition | Type | Values | Example |
|---|---|---|---|---|
age_group |
Age category of respondent | Categorical | 18-25, 26-35, 36-45, 45+ | “26-35” |
engagement_score |
1-10 scale of user interaction frequency | Numeric | 1-10 (whole numbers) | 7 |
content_type |
Primary content category shared | Text | Any string ≤100 chars | “news_article” |
Submit: First draft in 03_Project/02_Codebook/Data_Dictionary_v1.md
Activity 2: Pilot Code Your Data (Week 6)
Test your definitions on a small sample:
- Select 10-20 items - Sample from your actual data
- Code them - Use your definitions to code each item
- Track problems - What was ambiguous or unclear?
- Document decisions - How did you resolve edge cases?
Create: 03_Project/02_Codebook/Pilot_Decisions_Log.md
Example entry:
VARIABLE: content_type
PROBLEM: Item #5 is a news article shared as a screenshot
DECISION: Code as "news_article" (focus on content, not format)
APPLIED: Items #5, #12, #18
Activity 3: Get Feedback (Week 7)
Optional: Inter-Coder Reliability Check
If possible, have a peer code the same 10-20 items independently:
- Compare your coding - How often did you agree?
- Discuss disagreements - Why did you code differently?
- Update your definitions - Make them clearer if needed
Measure: Calculate % agreement or Cohen’s Kappa (your instructor will guide this)
Activity 4: Finalize & Iterate (Week 7)
Based on your pilot testing:
- Clarify definitions - Rewrite anything that was confusing
- Add more examples - Include edge cases you found
- Update allowed values - Add any new categories you discovered
- Document assumptions - What did you decide to do and why?
Submit: 03_Project/02_Codebook/Data_Dictionary_v2_Final.md + Pilot Summary
Key Concepts
Variable Types
Numeric - Numbers you can do math with (age, score, count)
Categorical - Limited set of categories (gender, region, type)
Text/String - Free-form text (name, comment, description)
Date - Time-based (2024-01-15)
Boolean - True/False or Yes/No
Measurement Scales
Nominal - Categories with no order (blue, red, green)
Ordinal - Ranked categories (low, medium, high)
Interval - Numbers with equal spacing (temperature in°C)
Ratio - Numbers with true zero (height, weight, age)
Resources for Phase 3
📋 Data Dictionary Template
5 min
Ready-to-use template in your vault. Copy and customize for your project.
📚 Codebook Best Practices
Beginner
Tips for writing clear variable definitions that others can follow.
[Content to be added]
🔍 Inter-Coder Reliability Guide
Intermediate
How to test agreement between multiple coders and calculate kappa statistics.
[Content to be added]
What Success Looks Like
By the end of Phase 3, you should have:
✅ Complete Data Dictionary - All variables defined with types, values, examples
✅ Pilot Testing Results - Coded 10-20 sample items
✅ Decisions Log - Edge cases documented and resolved
✅ Updated Dictionary v2 - Refined based on pilot feedback
✅ Week 5-7 Journal Entries - Reflections on your coding process
✅ Ready for Phase 4 - Your definitions are clear and tested
Milestone Timeline
| Week | Activity | Deliverable |
|---|---|---|
| Week 5 | Start data dictionary | Draft with variable list |
| Week 6 | Pilot code 10-20 items | Decisions log + coded samples |
| Week 7 | Refine & finalize | Data Dictionary v2 + summary |
Example: Simple Data Dictionary
Project: Analyzing tweet sentiment about climate change
| Variable | Definition | Type | Values | Example |
|---|---|---|---|---|
tweet_id |
Unique identifier | Text | Any string | “1234567890” |
date_posted |
When tweet was published | Date | YYYY-MM-DD | “2024-01-15” |
text |
Full text of tweet | Text | Free-form ≤280 chars | “Climate action is…” |
sentiment |
Overall emotional tone | Categorical | positive, neutral, negative | “positive” |
has_action_call |
Does tweet ask readers to do something? | Boolean | yes, no | “yes” |
reach |
Number of impressions | Numeric | 0-9,999,999 | 1250 |
Next Phase
Once you complete Phase 3:
→ Phase IV: The Analyst - Analyze your data using R
Common Questions
Q: What if I don’t know all variables yet?
A: Start with what you know. You can add more in Phase 4.
Q: How detailed should my definitions be?
A: Detailed enough that someone else could code the same way.
Q: What’s a “good” inter-coder agreement score?
A: 80%+ agreement is typical; 90%+ is excellent.