From Vibes to Variables: Building Your Quantitative Codebook
You’ve spent weeks immersed in your media source — reading articles, watching videos, scrolling through posts. You’ve noticed patterns. You’ve written about them in your qualitative memo. Now it’s time to turn those observations into something you can count.
This chapter walks you through the process of building a quantitative codebook — the document that translates your qualitative insights into numeric codes that R can analyze.
What Is a Quantitative Codebook?
A codebook is a measurement manual. It tells you (and anyone else) exactly how to assign a number to something you observe in your data. Every variable in your research gets:
- A name — short, descriptive, no spaces (e.g.,
tone,source_type,article_length) - A conceptual definition — what the variable means in theory
- An operational definition — exactly how you measure it
- Response categories — the specific values it can take, each with a number
Example: Music Dataset
In our class dataset (music_data_raw.csv), the variable mode works like this:
| Element | Value |
|---|---|
| Variable name | mode |
| Conceptual definition | The modality of a musical track — whether it sounds “happy” (major) or “sad” (minor) |
| Operational definition | Spotify’s audio analysis algorithm classifies each track as major (1) or minor (0) |
| Response categories | 0 = Minor, 1 = Major |
Notice: the codebook doesn’t say “happy-sounding” or “sad-sounding” in the data. It uses numbers (0 and 1). This is what makes it quantitative.
From Qualitative Observations to Numeric Codes
Your qualitative memo described patterns you noticed. Now you need to formalize those patterns into variables. Here’s the translation process:
Step 1: List Your Observations
Go back to your qualitative memo. What patterns did you notice? Write each one as a short phrase:
- Example: “Most articles about the topic were negative in tone”
- Example: “Stories from wire services were shorter than local reporting”
- Example: “Videos with faces in thumbnails got more views”
Step 2: Convert Each Pattern to a Variable
For each observation, ask: What am I actually measuring?
| Observation | Variable | Type |
|---|---|---|
| “Most articles were negative” | tone |
Categorical (positive, negative, neutral) |
| “Wire stories were shorter” | source_type |
Categorical (local, wire, syndicated) |
| “Wire stories were shorter” | word_count |
Continuous (actual number) |
| “Thumbnails with faces got more views” | face_in_thumbnail |
Binary (yes/no) |
Step 3: Assign Numeric Codes
Every category needs a number. This is non-negotiable — R works with numbers, not words.
| Variable | Category | Code |
|---|---|---|
tone |
Positive | 1 |
tone |
Neutral | 2 |
tone |
Negative | 3 |
source_type |
Local | 1 |
source_type |
Wire | 2 |
source_type |
Syndicated | 3 |
face_in_thumbnail |
No | 0 |
face_in_thumbnail |
Yes | 1 |
In your coding spreadsheet, always enter the number, not the word. Enter 1, not "Positive". Enter 0, not "No". When you import this into R later, you’ll convert these numbers back into labeled categories (called “factors”). But the raw data should be numeric.
Setting Up Your Coding Spreadsheet
Open Excel, Google Sheets, or any spreadsheet program. Set it up like this:
| Column A | Column B | Column C | Column D | Column E | Column F |
|---|---|---|---|---|---|
| item_id | date | source_type | tone | word_count | coder_notes |
| 1 | 2026-01-15 | 1 | 3 | 487 | clear negative framing |
| 2 | 2026-01-16 | 2 | 2 | 312 | wire service, neutral |
| 3 | 2026-01-16 | 1 | 1 | 621 | positive local feature |
Rules for your spreadsheet:
- Row 1 is always the header row (variable names)
- Each row after that is one item (one article, one post, one video)
- Each column is one variable
- No merged cells, no colors-as-data, no blank rows in the middle
- One column for notes — this is your decision log for ambiguous cases
- Variable names: lowercase, no spaces, use underscores (
word_count, notWord Count)
Your codebook from the Codebook & Qual Memo assignment is the foundation. But after your pilot test, you may have discovered variables that need adjustment — categories that overlap, definitions that are ambiguous, or codes you never actually used. Update your codebook now, before you code 100+ items. It’s much easier to fix the measurement tool than to re-code the data.
Decision Rules and Anchor Examples
The hardest part of coding isn’t the first 10 items — it’s item 75, when your definitions start to feel fuzzy. Two tools help:
Decision Rules
A decision rule is a specific instruction for handling ambiguous cases. Write them into your codebook.
- Example: “If an article contains both positive and negative framing, code the dominant tone (whichever occupies more than 50% of the content). If roughly equal, code as neutral (2).”
- Example: “If the source is listed as ‘AP’ or ‘Reuters,’ code as wire (2) regardless of which outlet published it.”
Anchor Examples
An anchor example is a specific item that clearly represents each category. Keep 3–5 anchors per variable and re-code them at the start of each coding session.
| Variable | Category | Anchor Item | Why It’s a Clear Example |
|---|---|---|---|
tone |
Positive (1) | Item #3 | Celebratory headline, quotes from supporters, no counterpoint |
tone |
Neutral (2) | Item #7 | Factual reporting, balanced quotes, no evaluative language |
tone |
Negative (3) | Item #1 | Critical framing, emphasis on failure, negative headline |
Quality Check Before Moving On
Before you start collecting your full dataset, verify:
Try It Yourself
Look at the music dataset’s playlist_genre variable. It contains categories like “pop,” “rock,” “r&b,” “rap,” “latin,” and “edm.”
- Write a conceptual definition for
playlist_genre(what does genre mean in this context?). - Write an operational definition (how was genre determined — by Spotify’s playlist classification).
- If you were coding genre yourself (not using Spotify’s classification), what decision rules would you need for songs that blend genres?