From Vibes to Variables: Building Your Quantitative Codebook

You’ve spent weeks immersed in your media source — reading articles, watching videos, scrolling through posts. You’ve noticed patterns. You’ve written about them in your qualitative memo. Now it’s time to turn those observations into something you can count.

This chapter walks you through the process of building a quantitative codebook — the document that translates your qualitative insights into numeric codes that R can analyze.

What Is a Quantitative Codebook?

A codebook is a measurement manual. It tells you (and anyone else) exactly how to assign a number to something you observe in your data. Every variable in your research gets:

A name — short, descriptive, no spaces (e.g., tone, source_type, article_length)
A conceptual definition — what the variable means in theory
An operational definition — exactly how you measure it
Response categories — the specific values it can take, each with a number

Example: Music Dataset

In our class dataset (music_data_raw.csv), the variable mode works like this:

Element	Value
Variable name	`mode`
Conceptual definition	The modality of a musical track — whether it sounds “happy” (major) or “sad” (minor)
Operational definition	Spotify’s audio analysis algorithm classifies each track as major (1) or minor (0)
Response categories	0 = Minor, 1 = Major

Notice: the codebook doesn’t say “happy-sounding” or “sad-sounding” in the data. It uses numbers (0 and 1). This is what makes it quantitative.

From Qualitative Observations to Numeric Codes

Your qualitative memo described patterns you noticed. Now you need to formalize those patterns into variables. Here’s the translation process:

Step 1: List Your Observations

Go back to your qualitative memo. What patterns did you notice? Write each one as a short phrase:

Example: “Most articles about the topic were negative in tone”
Example: “Stories from wire services were shorter than local reporting”
Example: “Videos with faces in thumbnails got more views”

Step 2: Convert Each Pattern to a Variable

For each observation, ask: What am I actually measuring?

Observation	Variable	Type
“Most articles were negative”	`tone`	Categorical (positive, negative, neutral)
“Wire stories were shorter”	`source_type`	Categorical (local, wire, syndicated)
“Wire stories were shorter”	`word_count`	Continuous (actual number)
“Thumbnails with faces got more views”	`face_in_thumbnail`	Binary (yes/no)

Step 3: Assign Numeric Codes

Every category needs a number. This is non-negotiable — R works with numbers, not words.

Variable	Category	Code
`tone`	Positive	1
`tone`	Neutral	2
`tone`	Negative	3
`source_type`	Local	1
`source_type`	Wire	2
`source_type`	Syndicated	3
`face_in_thumbnail`	No	0
`face_in_thumbnail`	Yes	1

Convention: Use Numbers, Not Words

In your coding spreadsheet, always enter the number, not the word. Enter 1, not "Positive". Enter 0, not "No". When you import this into R later, you’ll convert these numbers back into labeled categories (called “factors”). But the raw data should be numeric.

Setting Up Your Coding Spreadsheet

Open Excel, Google Sheets, or any spreadsheet program. Set it up like this:

Column A	Column B	Column C	Column D	Column E	Column F
item_id	date	source_type	tone	word_count	coder_notes
1	2026-01-15	1	3	487	clear negative framing
2	2026-01-16	2	2	312	wire service, neutral
3	2026-01-16	1	1	621	positive local feature

Rules for your spreadsheet:

Row 1 is always the header row (variable names)
Each row after that is one item (one article, one post, one video)
Each column is one variable
No merged cells, no colors-as-data, no blank rows in the middle
One column for notes — this is your decision log for ambiguous cases
Variable names: lowercase, no spaces, use underscores (word_count, not Word Count)

Connection to Your Project

Your codebook from the Codebook & Qual Memo assignment is the foundation. But after your pilot test, you may have discovered variables that need adjustment — categories that overlap, definitions that are ambiguous, or codes you never actually used. Update your codebook now, before you code 100+ items. It’s much easier to fix the measurement tool than to re-code the data.

Decision Rules and Anchor Examples

The hardest part of coding isn’t the first 10 items — it’s item 75, when your definitions start to feel fuzzy. Two tools help:

Decision Rules

A decision rule is a specific instruction for handling ambiguous cases. Write them into your codebook.

Example: “If an article contains both positive and negative framing, code the dominant tone (whichever occupies more than 50% of the content). If roughly equal, code as neutral (2).”
Example: “If the source is listed as ‘AP’ or ‘Reuters,’ code as wire (2) regardless of which outlet published it.”

Anchor Examples

An anchor example is a specific item that clearly represents each category. Keep 3–5 anchors per variable and re-code them at the start of each coding session.

Variable	Category	Anchor Item	Why It’s a Clear Example
`tone`	Positive (1)	Item #3	Celebratory headline, quotes from supporters, no counterpoint
`tone`	Neutral (2)	Item #7	Factual reporting, balanced quotes, no evaluative language
`tone`	Negative (3)	Item #1	Critical framing, emphasis on failure, negative headline

Quality Check Before Moving On

Before you start collecting your full dataset, verify:

Every variable in your codebook has a name, conceptual definition, operational definition, and numeric codes
Your coding spreadsheet has clean column headers matching your variable names
You have at least one decision rule for each categorical variable
You have 3–5 anchor examples you can re-code at the start of each session
Your pilot data (from the Sampling Plan assignment) is entered in the spreadsheet and looks correct

Try It Yourself

Look at the music dataset’s playlist_genre variable. It contains categories like “pop,” “rock,” “r&b,” “rap,” “latin,” and “edm.”

Write a conceptual definition for playlist_genre (what does genre mean in this context?).
Write an operational definition (how was genre determined — by Spotify’s playlist classification).
If you were coding genre yourself (not using Spotify’s classification), what decision rules would you need for songs that blend genres?