From Vibes to Variables: Building Your Quantitative Codebook

You’ve spent weeks immersed in your media source — reading articles, watching videos, scrolling through posts. You’ve noticed patterns. You’ve written about them in your qualitative memo. Now it’s time to turn those observations into something you can count.

This chapter walks you through the process of building a quantitative codebook — the document that translates your qualitative insights into numeric codes that R can analyze.

What Is a Quantitative Codebook?

A codebook is a measurement manual. It tells you (and anyone else) exactly how to assign a number to something you observe in your data. Every variable in your research gets:

  1. A name — short, descriptive, no spaces (e.g., tone, source_type, article_length)
  2. A conceptual definition — what the variable means in theory
  3. An operational definition — exactly how you measure it
  4. Response categories — the specific values it can take, each with a number

Example: Music Dataset

In our class dataset (music_data_raw.csv), the variable mode works like this:

Element Value
Variable name mode
Conceptual definition The modality of a musical track — whether it sounds “happy” (major) or “sad” (minor)
Operational definition Spotify’s audio analysis algorithm classifies each track as major (1) or minor (0)
Response categories 0 = Minor, 1 = Major

Notice: the codebook doesn’t say “happy-sounding” or “sad-sounding” in the data. It uses numbers (0 and 1). This is what makes it quantitative.

From Qualitative Observations to Numeric Codes

Your qualitative memo described patterns you noticed. Now you need to formalize those patterns into variables. Here’s the translation process:

Step 1: List Your Observations

Go back to your qualitative memo. What patterns did you notice? Write each one as a short phrase:

  • Example: “Most articles about the topic were negative in tone”
  • Example: “Stories from wire services were shorter than local reporting”
  • Example: “Videos with faces in thumbnails got more views”

Step 2: Convert Each Pattern to a Variable

For each observation, ask: What am I actually measuring?

Observation Variable Type
“Most articles were negative” tone Categorical (positive, negative, neutral)
“Wire stories were shorter” source_type Categorical (local, wire, syndicated)
“Wire stories were shorter” word_count Continuous (actual number)
“Thumbnails with faces got more views” face_in_thumbnail Binary (yes/no)

Step 3: Assign Numeric Codes

Every category needs a number. This is non-negotiable — R works with numbers, not words.

Variable Category Code
tone Positive 1
tone Neutral 2
tone Negative 3
source_type Local 1
source_type Wire 2
source_type Syndicated 3
face_in_thumbnail No 0
face_in_thumbnail Yes 1
ImportantConvention: Use Numbers, Not Words

In your coding spreadsheet, always enter the number, not the word. Enter 1, not "Positive". Enter 0, not "No". When you import this into R later, you’ll convert these numbers back into labeled categories (called “factors”). But the raw data should be numeric.

Setting Up Your Coding Spreadsheet

Open Excel, Google Sheets, or any spreadsheet program. Set it up like this:

Column A Column B Column C Column D Column E Column F
item_id date source_type tone word_count coder_notes
1 2026-01-15 1 3 487 clear negative framing
2 2026-01-16 2 2 312 wire service, neutral
3 2026-01-16 1 1 621 positive local feature

Rules for your spreadsheet:

  • Row 1 is always the header row (variable names)
  • Each row after that is one item (one article, one post, one video)
  • Each column is one variable
  • No merged cells, no colors-as-data, no blank rows in the middle
  • One column for notes — this is your decision log for ambiguous cases
  • Variable names: lowercase, no spaces, use underscores (word_count, not Word Count)
TipConnection to Your Project

Your codebook from the Codebook & Qual Memo assignment is the foundation. But after your pilot test, you may have discovered variables that need adjustment — categories that overlap, definitions that are ambiguous, or codes you never actually used. Update your codebook now, before you code 100+ items. It’s much easier to fix the measurement tool than to re-code the data.

Decision Rules and Anchor Examples

The hardest part of coding isn’t the first 10 items — it’s item 75, when your definitions start to feel fuzzy. Two tools help:

Decision Rules

A decision rule is a specific instruction for handling ambiguous cases. Write them into your codebook.

  • Example: “If an article contains both positive and negative framing, code the dominant tone (whichever occupies more than 50% of the content). If roughly equal, code as neutral (2).”
  • Example: “If the source is listed as ‘AP’ or ‘Reuters,’ code as wire (2) regardless of which outlet published it.”

Anchor Examples

An anchor example is a specific item that clearly represents each category. Keep 3–5 anchors per variable and re-code them at the start of each coding session.

Variable Category Anchor Item Why It’s a Clear Example
tone Positive (1) Item #3 Celebratory headline, quotes from supporters, no counterpoint
tone Neutral (2) Item #7 Factual reporting, balanced quotes, no evaluative language
tone Negative (3) Item #1 Critical framing, emphasis on failure, negative headline

Quality Check Before Moving On

Before you start collecting your full dataset, verify:

Try It Yourself

Look at the music dataset’s playlist_genre variable. It contains categories like “pop,” “rock,” “r&b,” “rap,” “latin,” and “edm.”

  1. Write a conceptual definition for playlist_genre (what does genre mean in this context?).
  2. Write an operational definition (how was genre determined — by Spotify’s playlist classification).
  3. If you were coding genre yourself (not using Spotify’s classification), what decision rules would you need for songs that blend genres?