� Phase III: The Builder

Phase III: The Builder

Duration: Weeks 5-12 (Feb 09 - Mar 30) | Points: 250
Construct your research framework and measurement tools

Learning Objectives

By the end of Phase III, you will:

✅ Apply theory to frame research questions
✅ Define variables conceptually and operationally
✅ Build a comprehensive codebook
✅ Design and pilot test a sampling plan
✅ Complete ethics certification (CITI)

✅ Test definitions with pilot data
✅ Document coding decisions and edge cases
✅ Assess inter-coder reliability
✅ Iterate on definitions based on pilot testing :::

What You’ll Do

A data dictionary is like a translation guide for your data. It tells anyone (including future you!) exactly what each variable means:

Variable names - How is this stored in your data?
Definitions - What does this variable measure?
Types - Is it text, number, date, category?
Allowed values - What answers are valid?
Examples - Real samples of how to code things
Special rules - Edge cases and how to handle them

Phase 3 Content & Activities

Activity 1: Start Your Data Dictionary (Week 5)

Copy the Template - Get the Data Dictionary Template from your vault
List Your Variables - From your research questions, what will you measure?
Define Each One - Write clear definitions for every variable
Specify Types - Is each variable numeric, text, categorical, etc.?

Example:

Variable	Definition	Type	Values	Example
`age_group`	Age category of respondent	Categorical	18-25, 26-35, 36-45, 45+	“26-35”
`engagement_score`	1-10 scale of user interaction frequency	Numeric	1-10 (whole numbers)	7
`content_type`	Primary content category shared	Text	Any string ≤100 chars	“news_article”

Submit: First draft in 03_Project/02_Codebook/Data_Dictionary_v1.md

Activity 2: Pilot Code Your Data (Week 6)

Test your definitions on a small sample:

Select 10-20 items - Sample from your actual data
Code them - Use your definitions to code each item
Track problems - What was ambiguous or unclear?
Document decisions - How did you resolve edge cases?

Create: 03_Project/02_Codebook/Pilot_Decisions_Log.md

Example entry:

VARIABLE: content_type
PROBLEM: Item #5 is a news article shared as a screenshot
DECISION: Code as "news_article" (focus on content, not format)
APPLIED: Items #5, #12, #18

Activity 3: Get Feedback (Week 7)

Optional: Inter-Coder Reliability Check

If possible, have a peer code the same 10-20 items independently:

Compare your coding - How often did you agree?
Discuss disagreements - Why did you code differently?
Update your definitions - Make them clearer if needed

Measure: Calculate % agreement or Cohen’s Kappa (your instructor will guide this)

Activity 4: Finalize & Iterate (Week 7)

Based on your pilot testing:

Clarify definitions - Rewrite anything that was confusing
Add more examples - Include edge cases you found
Update allowed values - Add any new categories you discovered
Document assumptions - What did you decide to do and why?

Submit: 03_Project/02_Codebook/Data_Dictionary_v2_Final.md + Pilot Summary

Key Concepts

Variable Types

Numeric - Numbers you can do math with (age, score, count)
Categorical - Limited set of categories (gender, region, type)
Text/String - Free-form text (name, comment, description)
Date - Time-based (2024-01-15)
Boolean - True/False or Yes/No

Measurement Scales

Nominal - Categories with no order (blue, red, green)
Ordinal - Ranked categories (low, medium, high)
Interval - Numbers with equal spacing (temperature in°C)
Ratio - Numbers with true zero (height, weight, age)

Resources for Phase 3

📋 Data Dictionary Template

5 min

Ready-to-use template in your vault. Copy and customize for your project.

View Template

📚 Codebook Best Practices

Beginner

Tips for writing clear variable definitions that others can follow.

[Content to be added]

🔍 Inter-Coder Reliability Guide

Intermediate

How to test agreement between multiple coders and calculate kappa statistics.

[Content to be added]

What Success Looks Like

By the end of Phase 3, you should have:

✅ Complete Data Dictionary - All variables defined with types, values, examples
✅ Pilot Testing Results - Coded 10-20 sample items
✅ Decisions Log - Edge cases documented and resolved
✅ Updated Dictionary v2 - Refined based on pilot feedback
✅ Week 5-7 Journal Entries - Reflections on your coding process
✅ Ready for Phase 4 - Your definitions are clear and tested

Milestone Timeline

Week	Activity	Deliverable
Week 5	Start data dictionary	Draft with variable list
Week 6	Pilot code 10-20 items	Decisions log + coded samples
Week 7	Refine & finalize	Data Dictionary v2 + summary

Example: Simple Data Dictionary

Project: Analyzing tweet sentiment about climate change

Variable	Definition	Type	Values	Example
`tweet_id`	Unique identifier	Text	Any string	“1234567890”
`date_posted`	When tweet was published	Date	YYYY-MM-DD	“2024-01-15”
`text`	Full text of tweet	Text	Free-form ≤280 chars	“Climate action is…”
`sentiment`	Overall emotional tone	Categorical	positive, neutral, negative	“positive”
`has_action_call`	Does tweet ask readers to do something?	Boolean	yes, no	“yes”
`reach`	Number of impressions	Numeric	0-9,999,999	1250

Next Phase

Once you complete Phase 3:

→ Phase IV: The Analyst - Analyze your data using R

Common Questions

Q: What if I don’t know all variables yet?
A: Start with what you know. You can add more in Phase 4.

Q: How detailed should my definitions be?
A: Detailed enough that someone else could code the same way.

Q: What’s a “good” inter-coder agreement score?
A: 80%+ agreement is typical; 90%+ is excellent.