Chapter 16: The Portfolio

Learning Objectives

Understand the advantages of reproducible reports over traditional documents
Create Quarto documents integrating code, analysis, and narrative
Structure research reports following academic conventions
Generate tables and figures automatically from code
Manage citations using BibTeX
Render documents to multiple formats (PDF, HTML, Word)
Implement the “one-click report” workflow for reproducibility
Debug common Quarto rendering issues

You’ve completed your research. You have: - A research question grounded in theory - 200 songs systematically coded using a pilot-tested codebook - Cleaned data in R - Visualizations showing patterns - Statistical tests quantifying relationships - Effect sizes contextualizing findings - An honest interpretation acknowledging limitations

Now you need to package it all into a research report.

You could, theoretically, write everything in Microsoft Word. Copy-paste tables from R. Save figures as images and insert them manually. Type your results section by hand. Update the tables every time your data changes. Re-export every figure when you adjust the color scheme.

But this workflow is fragile, tedious, and error-prone.

Quarto solves this by integrating code, analysis, and narrative into a single reproducible document. Change your data? The entire report updates automatically. Adjust a visualization? It re-renders instantly. Add a citation? The bibliography updates. Everything is scripted, documented, and reproducible.

This chapter teaches you to create professional research reports that embody the reproducibility principles you’ve practiced throughout this book.

What is Quarto?

Quarto is a scientific publishing system that turns plain text files into polished documents: PDFs, HTML web pages, Word documents, presentations, even books.

The magic: You write in Markdown (simple formatting syntax), embed R code chunks for analysis, and Quarto combines them into a final document with: - Automatic table and figure numbering - Citations and references - Formatted code output - Professional styling

The advantage over Word: - Reproducible: Rerun the entire analysis with one click - Version-controllable: Track changes with Git - Automated: Tables and figures update when code changes - Multi-format: Same source file → PDF, HTML, or Word

The learning curve: Higher than Word initially, but pays off quickly for research projects.

Installing Quarto

Already have R and RStudio? You’re almost ready.

Install Quarto: 1. Download from quarto.org 2. Install for your operating system 3. Restart RStudio

Check installation:

Open RStudio, go to File → New File → Quarto Document. If the option appears, you’re ready.

Anatomy of a Quarto Document

A Quarto document (.qmd file) has three components:

1. YAML Header (Metadata)

At the top of the file, between --- markers:

---
title: "Lyric Sentiment and Chart Performance"
author: "Your Name"
date: "2026-02-10"
format: pdf
---

This controls document properties: title, author, output format, styling.

2. Markdown Text (Narrative)

Plain text with simple formatting:

# Introduction

Popular music has increasingly embraced negative emotional themes. This study 
examines whether lyric sentiment relates to chart performance.

## Research Question

**RQ1**: Do songs with negative lyric sentiment achieve higher chart positions 
than songs with positive sentiment?

Markdown syntax: - # Heading 1, ## Heading 2, ### Heading 3 - **bold**, *italic* - - bullet point - [link text](url)

3. Code Chunks (Analysis)

R code embedded in special blocks:

```{r}
#| label: load-data
#| message: false

library(tidyverse)
music_data <- read_csv("data_clean/final_unified_dataset.csv")
```

Code chunk options (after #|): - label: Names the chunk (useful for debugging) - message: false: Suppresses messages when loading packages - echo: false: Hides code, shows only output - warning: false: Suppresses warnings

Your First Quarto Report

Create a new Quarto document:

File → New File → Quarto Document → Fill in title and author → OK

Default template appears:

---
title: "Untitled"
format: html
---

## Quarto

Quarto enables you to weave together content and executable code into a 
finished document...

Click “Render” (the blue arrow at the top of the editor)

RStudio runs the code, combines it with text, and produces an HTML file that opens in a browser.

Congratulations: You just created a reproducible document.

Building a Research Report

Let’s create a complete research report structure.

Basic Structure

---
title: "Lyric Sentiment and Chart Performance in Popular Music"
author: "Your Name"
date: "2026-02-10"
format: 
  pdf:
    toc: true
    number-sections: true
bibliography: references.bib
---

# Abstract

This study examined the relationship between lyric sentiment and chart 
performance using systematic content analysis of 200 Billboard Hot 100 
songs (2015-2024). Results indicate...

# Introduction

[Background and context]

# Literature Review

[Theory and prior research]

# Method

## Sample

## Coding Procedure

## Reliability

# Results

# Discussion

# References

::: {#refs}
:::

Key YAML options: - toc: true: Creates table of contents - number-sections: true: Automatic section numbering - bibliography: references.bib: Specifies citation file

Loading and Preparing Data

Create a setup chunk at the beginning:

```{r}
#| label: setup
#| include: false

# Load packages
library(tidyverse)
library(knitr)
library(kableExtra)

# Load cleaned data
music_data <- read_csv("data_clean/final_unified_dataset.csv")

# Set global chunk options
knitr::opts_chunk$set(
  echo = FALSE,        # Don't show code by default
  warning = FALSE,     # Suppress warnings
  message = FALSE,     # Suppress messages
  fig.width = 7,       # Default figure width
  fig.height = 5       # Default figure height
)
```

include: false means this entire chunk is hidden—no code, no output. It just runs silently.

Creating Tables

Descriptive Statistics Table

# Results

## Sample Characteristics

```{r}
#| label: tbl-descriptives
#| tbl-cap: "Descriptive Statistics for Music Dataset"

music_data %>%
  group_by(Lyric_Sentiment) %>%
  summarize(
    N = n(),
    `Mean Tempo` = round(mean(Tempo, na.rm = TRUE), 1),
    `SD Tempo` = round(sd(Tempo, na.rm = TRUE), 1),
    `Mean Chart Peak` = round(mean(Max_Rank, na.rm = TRUE), 1)
  ) %>%
  kable(
    col.names = c("Sentiment", "N", "Mean Tempo (BPM)", "SD", "Mean Chart Peak"),
    align = "lcccc"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
```

Output (in rendered document):

Table 1: Descriptive Statistics for Music Dataset

Sentiment	N	Mean Tempo (BPM)	SD	Mean Chart Peak
Positive	50	119.2	22.3	46.5
Negative	50	128.5	24.1	32.4
Neutral	30	115.8	20.9	51.2
Mixed	50	123.6	23.7	38.9

tbl-cap creates the table caption. Quarto automatically numbers tables.

Reference the table in text: “As shown in ?tbl-descriptives, negative sentiment songs had higher average tempo…”

Contingency Table

```{r}
#| label: tbl-contingency
#| tbl-cap: "Sentiment by Top-10 Chart Status"

music_data %>%
  mutate(top10 = if_else(Max_Rank <= 10, "Top 10", "Not Top 10")) %>%
  count(Lyric_Sentiment, top10) %>%
  pivot_wider(names_from = top10, values_from = n, values_fill = 0) %>%
  kable(
    col.names = c("Sentiment", "Not Top 10", "Top 10")
  ) %>%
  kable_styling()
```

Creating Figures

Bar Chart

```{r}
#| label: fig-sentiment-dist
#| fig-cap: "Distribution of Lyric Sentiment Categories"
#| fig-width: 6
#| fig-height: 4

ggplot(music_data, aes(x = Lyric_Sentiment, fill = Lyric_Sentiment)) +
  geom_bar() +
  scale_fill_manual(values = c("Positive" = "#2E86AB", 
                                 "Negative" = "#A23B72", 
                                 "Neutral" = "#F18F01", 
                                 "Mixed" = "#C73E1D")) +
  labs(
    x = "Lyric Sentiment",
    y = "Count"
  ) +
  theme_minimal() +
  theme(legend.position = "none")
```

fig-cap creates the figure caption. Quarto automatically numbers figures.

Reference in text: “Figure ?fig-sentiment-dist shows the distribution of sentiment categories in our sample.”

Scatterplot with Regression Line

```{r}
#| label: fig-tempo-chart
#| fig-cap: "Relationship Between Tempo and Chart Performance by Sentiment"
#| fig-width: 8
#| fig-height: 5

ggplot(music_data, aes(x = Tempo, y = Max_Rank, color = Lyric_Sentiment)) +
  geom_point(alpha = 0.6, size = 2) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.8) +
  scale_y_reverse() +
  scale_color_manual(
    values = c("Positive" = "#2E86AB", "Negative" = "#A23B72", 
               "Neutral" = "#F18F01", "Mixed" = "#C73E1D"),
    name = "Lyric Sentiment"
  ) +
  labs(
    x = "Tempo (BPM)",
    y = "Peak Chart Position (lower = better)"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

Reporting Statistical Results

You can embed R code inline within sentences:

A chi-square test revealed a significant relationship between sentiment and 
top-10 status, χ²(`r chisq.test(music_data$Lyric_Sentiment, music_data$top10)$parameter`) 
= `r round(chisq.test(music_data$Lyric_Sentiment, music_data$top10)$statistic, 2)`, 
*p* = `r round(chisq.test(music_data$Lyric_Sentiment, music_data$top10)$p.value, 3)`.

Renders as:

“A chi-square test revealed a significant relationship between sentiment and top-10 status, χ²(3) = 8.42, p = 0.038.”

The advantage: If you add more data and rerun, all statistics update automatically.

Creating a Results Code Chunk

Store test results once, reference multiple times:

```{r}
#| label: stats-tests
#| include: false

# Run chi-square test
chi_result <- chisq.test(music_data$Lyric_Sentiment, music_data$top10)

# Extract values
chi_stat <- round(chi_result$statistic, 2)
chi_df <- chi_result$parameter
chi_p <- round(chi_result$p.value, 3)

# Calculate Cramér's V
library(rcompanion)
cramers_v <- round(cramerV(music_data$Lyric_Sentiment, music_data$top10), 2)
```

Then reference in text:

χ²(`r chi_df`) = `r chi_stat`, *p* = `r chi_p`, Cramér's V = `r cramers_v`

Managing Citations

Creating a Bibliography File

Create references.bib in your project folder:

@article{Wickham2014,
  author = {Wickham, Hadley},
  title = {Tidy Data},
  journal = {Journal of Statistical Software},
  year = {2014},
  volume = {59},
  number = {10},
  pages = {1--23}
}

@book{Cohen1988,
  author = {Cohen, Jacob},
  title = {Statistical Power Analysis for the Behavioral Sciences},
  publisher = {Routledge},
  year = {1988},
  edition = {2nd}
}

Citing in Text

Data were organized following tidy data principles [@Wickham2014]. 
Effect sizes were interpreted using Cohen's benchmarks [@Cohen1988].

Multiple citations can be grouped [@Wickham2014; @Cohen1988].

Renders as:

“Data were organized following tidy data principles (Wickham, 2014). Effect sizes were interpreted using Cohen’s benchmarks (Cohen, 1988).”

Automatic Bibliography

At the end of your document:

# References

::: {#refs}
:::

Quarto automatically generates a formatted reference list from all citations in the document.

Getting BibTeX entries: - Google Scholar: Click “Cite” → “BibTeX” - Zotero: Right-click entry → Export → BibTeX

Rendering to Different Formats

PDF Output

---
title: "My Report"
format: 
  pdf:
    documentclass: article
    fontsize: 12pt
    geometry: margin=1in
    toc: true
    number-sections: true
---

Requires: LaTeX installed on your system - Mac: MacTeX - Windows: MiKTeX or TinyTeX - Linux: TeX Live

Quick LaTeX install (from R):

install.packages("tinytex")
tinytex::install_tinytex()

HTML Output

---
title: "My Report"
format: 
  html:
    toc: true
    toc-location: left
    code-fold: true
    theme: cosmo
---

Advantages: - No LaTeX required - Interactive elements possible - Easy to share (single HTML file)

code-fold: true lets readers click to see code if interested.

Word Output

---
title: "My Report"
format: 
  docx:
    toc: true
    number-sections: true
    reference-doc: custom-template.docx
---

Use when: Collaborators need to edit in Word

Custom template: Create a Word document with your preferred styles, save as custom-template.docx, and Quarto applies those styles.

The One-Click Report Workflow

The power of Quarto is full reproducibility:

Someone sends you new data
You replace the old CSV with the new one
You click “Render”
Every table, every figure, every statistic updates automatically

No manual copying. No stale numbers. No version confusion.

This is what reproducible research looks like.

Example Workflow

Monday: Analyze 200 songs, render report
Wednesday: Advisor says “Can you add 50 more songs?”
Thursday: You code 50 songs, append to CSV
Friday: Click “Render” → entire report updates with n=250 statistics

Traditional workflow: Manually update every table, re-export every figure, retype every statistic. Hours of tedious work. High error risk.

Quarto workflow: Click one button. 30 seconds. Perfect accuracy.

Advanced Features

Figure Panels (Subfigures)

```{r}
#| label: fig-panels
#| fig-cap: "Tempo and Chart Performance by Sentiment"
#| fig-subcap: 
#|   - "Boxplot comparison"
#|   - "Scatterplot with trend lines"
#| layout-ncol: 2

# Panel A
ggplot(music_data, aes(x = Lyric_Sentiment, y = Tempo)) +
  geom_boxplot() +
  theme_minimal()

# Panel B
ggplot(music_data, aes(x = Tempo, y = Max_Rank, color = Lyric_Sentiment)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()
```

Creates Figure 1 with panels (a) and (b).

Cross-References

As shown in @tbl-descriptives, the mean tempo varied by sentiment. 
This pattern is visualized in @fig-tempo-chart (panel b).

## Method {#sec-method}

Our coding procedure (see @sec-method) ensured high reliability...

Section references: Use {#sec-label} to label sections, then @sec-label to reference.

Conditional Content

```{r}
show_detailed_analysis <- FALSE
```

```{r}
#| eval: !expr show_detailed_analysis

# This chunk only runs if show_detailed_analysis is TRUE
# Useful for supplementary analyses you might want to toggle
```

Common Issues and Solutions

Issue 1: “LaTeX Error”

Symptom: PDF rendering fails with LaTeX error

Solution: - Install TinyTeX: tinytex::install_tinytex() - Or switch to HTML output temporarily

Issue 2: “Object not found”

Symptom: Error says variable doesn’t exist

Cause: Code chunks run in order. If you reference music_data before loading it, it fails.

Solution: Ensure data is loaded in early chunk before any analysis chunks.

Issue 3: Figures don’t appear

Symptom: Blank space where figure should be

Solutions: - Check fig-width and fig-height aren’t too large - Ensure echo: false is set (if you don’t want code shown) - Verify the ggplot code actually produces a plot when run in console

Issue 4: Citations don’t work

Symptom: @Author2020 appears as literal text, not formatted citation

Solutions: - Verify bibliography: references.bib in YAML - Check .bib file is in same folder as .qmd - Ensure citation keys match exactly (case-sensitive)

Practice: Building Your Report

Exercise 16.1: Basic Document

Create a Quarto document with: 1. YAML header (title, author, date, PDF format) 2. Three sections: Introduction, Method, Results 3. One code chunk that loads your music data 4. Render to PDF

Exercise 16.2: Tables and Figures

Add to your document: 1. A descriptive statistics table (mean tempo by sentiment) 2. A bar chart showing sentiment distribution 3. Proper captions for both 4. Cross-references in text

Exercise 16.3: Inline Statistics

Write a Results paragraph that: 1. Stores a t-test result in a hidden chunk 2. Reports the statistics inline using `r variable` syntax 3. Includes t-statistic, df, p-value, and means

Exercise 16.4: Citations

Create a references.bib file with 3 entries
Cite them in your Introduction
Add a References section
Render and verify the bibliography appears correctly

Reflection Questions

Reproducibility Trade-offs: Quarto requires more upfront learning than Word. Is the reproducibility worth it? For what kinds of projects would you choose Quarto over Word? When would Word be sufficient?
The Automation Question: When you automate table and figure generation, you lose manual control. What if you want a table formatted differently than the default? How do you balance automation’s efficiency with customization needs?
Transparency: Quarto makes your entire analysis visible and rerunnable. This is good for science but potentially uncomfortable for researchers (it exposes all decisions). Should all research code be shared publicly? Where’s the line between transparency and privacy?

Chapter Summary

This chapter taught reproducible document creation with Quarto:

Quarto integrates code, analysis, and narrative into reproducible documents
Three components: YAML metadata, Markdown text, R code chunks
Advantages over Word: Reproducibility, version control, automation, multi-format output
Tables: Create with kable(), auto-numbered with tbl-cap, reference with @tbl-label
Figures: Auto-numbered with fig-cap, reference with @fig-label
Inline code: Embed statistics in text with `r code`
Citations: Manage with BibTeX, cite with @key, auto-generate bibliography
Multiple formats: Same source → PDF (via LaTeX), HTML, or Word
One-click reports: Update entire document when data changes
Chunk options: Control visibility (echo), execution (eval), output (include)
Cross-references: Link to tables (@tbl-), figures (@fig-), sections (@sec-)

Key Terms

BibTeX: Citation format used by Quarto and LaTeX
Code chunk: Block of R code embedded in Quarto document
Cross-reference: Link to numbered table, figure, or section
Inline code: R code embedded within text (`r code`)
Knit/Render: Process of executing code and generating final document
Markdown: Lightweight markup language for formatting text
Quarto: Scientific publishing system combining code and narrative
Reproducible document: Document that can be regenerated from source code
YAML: Metadata format at top of Quarto documents (Yet Another Markup Language)

Looking Ahead

Chapter 17 (Going Live) concludes the textbook with research dissemination and sharing. You’ve created a reproducible research report—now you’ll learn to share it responsibly. This final chapter covers archiving data and code on the Open Science Framework and GitHub, writing abstracts and plain-language summaries for different audiences, creating online research portfolios, ethical considerations in sharing findings, and understanding open science practices. You’ll learn how to make your work discoverable, reproducible, and impactful beyond the classroom.

--- title: "Chapter 16: The Portfolio" --- ## Learning Objectives - Understand the advantages of reproducible reports over traditional documents - Create Quarto documents integrating code, analysis, and narrative - Structure research reports following academic conventions - Generate tables and figures automatically from code - Manage citations using BibTeX - Render documents to multiple formats (PDF, HTML, Word) - Implement the "one-click report" workflow for reproducibility - Debug common Quarto rendering issues --- You've completed your research. You have: - A research question grounded in theory - 200 songs systematically coded using a pilot-tested codebook - Cleaned data in R - Visualizations showing patterns - Statistical tests quantifying relationships - Effect sizes contextualizing findings - An honest interpretation acknowledging limitations Now you need to package it all into a research report. You could, theoretically, write everything in Microsoft Word. Copy-paste tables from R. Save figures as images and insert them manually. Type your results section by hand. Update the tables every time your data changes. Re-export every figure when you adjust the color scheme. But this workflow is fragile, tedious, and error-prone. **Quarto** solves this by integrating code, analysis, and narrative into a single reproducible document. Change your data? The entire report updates automatically. Adjust a visualization? It re-renders instantly. Add a citation? The bibliography updates. Everything is scripted, documented, and reproducible. This chapter teaches you to create professional research reports that embody the reproducibility principles you've practiced throughout this book. ## What is Quarto? **Quarto** is a scientific publishing system that turns plain text files into polished documents: PDFs, HTML web pages, Word documents, presentations, even books. **The magic**: You write in Markdown (simple formatting syntax), embed R code chunks for analysis, and Quarto combines them into a final document with: - Automatic table and figure numbering - Citations and references - Formatted code output - Professional styling **The advantage over Word**: - **Reproducible**: Rerun the entire analysis with one click - **Version-controllable**: Track changes with Git - **Automated**: Tables and figures update when code changes - **Multi-format**: Same source file → PDF, HTML, or Word **The learning curve**: Higher than Word initially, but pays off quickly for research projects. ## Installing Quarto **Already have R and RStudio?** You're almost ready. **Install Quarto**: 1. Download from [quarto.org](https://quarto.org/docs/get-started/) 2. Install for your operating system 3. Restart RStudio **Check installation**: Open RStudio, go to File → New File → Quarto Document. If the option appears, you're ready. ## Anatomy of a Quarto Document A Quarto document (`.qmd` file) has three components: ### 1. YAML Header (Metadata) At the top of the file, between `---` markers: ```yaml --- title: "Lyric Sentiment and Chart Performance" author: "Your Name" date: "2026-02-10" format: pdf --- ``` This controls document properties: title, author, output format, styling. ### 2. Markdown Text (Narrative) Plain text with simple formatting: ```markdown # Introduction Popular music has increasingly embraced negative emotional themes. This study examines whether lyric sentiment relates to chart performance. ## Research Question **RQ1**: Do songs with negative lyric sentiment achieve higher chart positions than songs with positive sentiment? ``` **Markdown syntax**: - `# Heading 1`, `## Heading 2`, `### Heading 3` - `**bold**`, `*italic*` - `- bullet point` - `[link text](url)` ### 3. Code Chunks (Analysis) R code embedded in special blocks: ```{r} #| label: load-data #| message: false library(tidyverse) music_data <- read_csv("data_clean/final_unified_dataset.csv") ``` **Code chunk options** (after `#|`): - `label`: Names the chunk (useful for debugging) - `message: false`: Suppresses messages when loading packages - `echo: false`: Hides code, shows only output - `warning: false`: Suppresses warnings ## Your First Quarto Report **Create a new Quarto document**: File → New File → Quarto Document → Fill in title and author → OK **Default template appears**: ```yaml --- title: "Untitled" format: html --- ## Quarto Quarto enables you to weave together content and executable code into a finished document... ``` **Click "Render"** (the blue arrow at the top of the editor) RStudio runs the code, combines it with text, and produces an HTML file that opens in a browser. **Congratulations**: You just created a reproducible document. ## Building a Research Report Let's create a complete research report structure. ### Basic Structure ```yaml --- title: "Lyric Sentiment and Chart Performance in Popular Music" author: "Your Name" date: "2026-02-10" format: pdf: toc: true number-sections: true bibliography: references.bib --- # Abstract This study examined the relationship between lyric sentiment and chart performance using systematic content analysis of 200 Billboard Hot 100 songs (2015-2024). Results indicate... # Introduction [Background and context] # Literature Review [Theory and prior research] # Method ## Sample ## Coding Procedure ## Reliability # Results # Discussion # References ::: {#refs} ::: ``` **Key YAML options**: - `toc: true`: Creates table of contents - `number-sections: true`: Automatic section numbering - `bibliography: references.bib`: Specifies citation file ### Loading and Preparing Data Create a setup chunk at the beginning: ```{r} #| label: setup #| include: false # Load packages library(tidyverse) library(knitr) library(kableExtra) # Load cleaned data music_data <- read_csv("data_clean/final_unified_dataset.csv") # Set global chunk options knitr::opts_chunk$set( echo = FALSE, # Don't show code by default warning = FALSE, # Suppress warnings message = FALSE, # Suppress messages fig.width = 7, # Default figure width fig.height = 5 # Default figure height ) ``` `include: false` means this entire chunk is hidden—no code, no output. It just runs silently. ## Creating Tables ### Descriptive Statistics Table ````markdown # Results ## Sample Characteristics ```` ```{r} #| label: tbl-descriptives #| tbl-cap: "Descriptive Statistics for Music Dataset" music_data %>% group_by(Lyric_Sentiment) %>% summarize( N = n(), `Mean Tempo` = round(mean(Tempo, na.rm = TRUE), 1), `SD Tempo` = round(sd(Tempo, na.rm = TRUE), 1), `Mean Chart Peak` = round(mean(Max_Rank, na.rm = TRUE), 1) ) %>% kable( col.names = c("Sentiment", "N", "Mean Tempo (BPM)", "SD", "Mean Chart Peak"), align = "lcccc" ) %>% kable_styling(bootstrap_options = c("striped", "hover")) ``` **Output** (in rendered document): **Table 1**: Descriptive Statistics for Music Dataset | Sentiment | N | Mean Tempo (BPM) | SD | Mean Chart Peak | |-----------|---|------------------|-----|-----------------| | Positive | 50 | 119.2 | 22.3 | 46.5 | | Negative | 50 | 128.5 | 24.1 | 32.4 | | Neutral | 30 | 115.8 | 20.9 | 51.2 | | Mixed | 50 | 123.6 | 23.7 | 38.9 | **`tbl-cap`** creates the table caption. Quarto automatically numbers tables. **Reference the table in text**: "As shown in @tbl-descriptives, negative sentiment songs had higher average tempo..." ### Contingency Table ```{r} #| label: tbl-contingency #| tbl-cap: "Sentiment by Top-10 Chart Status" music_data %>% mutate(top10 = if_else(Max_Rank <= 10, "Top 10", "Not Top 10")) %>% count(Lyric_Sentiment, top10) %>% pivot_wider(names_from = top10, values_from = n, values_fill = 0) %>% kable( col.names = c("Sentiment", "Not Top 10", "Top 10") ) %>% kable_styling() ``` ## Creating Figures ### Bar Chart ```{r} #| label: fig-sentiment-dist #| fig-cap: "Distribution of Lyric Sentiment Categories" #| fig-width: 6 #| fig-height: 4 ggplot(music_data, aes(x = Lyric_Sentiment, fill = Lyric_Sentiment)) + geom_bar() + scale_fill_manual(values = c("Positive" = "#2E86AB", "Negative" = "#A23B72", "Neutral" = "#F18F01", "Mixed" = "#C73E1D")) + labs( x = "Lyric Sentiment", y = "Count" ) + theme_minimal() + theme(legend.position = "none") ``` **`fig-cap`** creates the figure caption. Quarto automatically numbers figures. **Reference in text**: "Figure @fig-sentiment-dist shows the distribution of sentiment categories in our sample." ### Scatterplot with Regression Line ```{r} #| label: fig-tempo-chart #| fig-cap: "Relationship Between Tempo and Chart Performance by Sentiment" #| fig-width: 8 #| fig-height: 5 ggplot(music_data, aes(x = Tempo, y = Max_Rank, color = Lyric_Sentiment)) + geom_point(alpha = 0.6, size = 2) + geom_smooth(method = "lm", se = FALSE, linewidth = 0.8) + scale_y_reverse() + scale_color_manual( values = c("Positive" = "#2E86AB", "Negative" = "#A23B72", "Neutral" = "#F18F01", "Mixed" = "#C73E1D"), name = "Lyric Sentiment" ) + labs( x = "Tempo (BPM)", y = "Peak Chart Position (lower = better)" ) + theme_minimal() + theme(legend.position = "bottom") ``` ## Reporting Statistical Results You can embed R code *inline* within sentences: ````markdown A chi-square test revealed a significant relationship between sentiment and top-10 status, χ²(`r chisq.test(music_data$Lyric_Sentiment, music_data$top10)$parameter`) = `r round(chisq.test(music_data$Lyric_Sentiment, music_data$top10)$statistic, 2)`, *p* = `r round(chisq.test(music_data$Lyric_Sentiment, music_data$top10)$p.value, 3)`. ```` **Renders as**: "A chi-square test revealed a significant relationship between sentiment and top-10 status, χ²(3) = 8.42, *p* = 0.038." **The advantage**: If you add more data and rerun, all statistics update automatically. ### Creating a Results Code Chunk Store test results once, reference multiple times: ```{r} #| label: stats-tests #| include: false # Run chi-square test chi_result <- chisq.test(music_data$Lyric_Sentiment, music_data$top10) # Extract values chi_stat <- round(chi_result$statistic, 2) chi_df <- chi_result$parameter chi_p <- round(chi_result$p.value, 3) # Calculate Cramér's V library(rcompanion) cramers_v <- round(cramerV(music_data$Lyric_Sentiment, music_data$top10), 2) ``` **Then reference in text**: ```markdown χ²(`r chi_df`) = `r chi_stat`, *p* = `r chi_p`, Cramér's V = `r cramers_v` ``` ## Managing Citations ### Creating a Bibliography File Create `references.bib` in your project folder: ```bibtex @article{Wickham2014, author = {Wickham, Hadley}, title = {Tidy Data}, journal = {Journal of Statistical Software}, year = {2014}, volume = {59}, number = {10}, pages = {1--23} } @book{Cohen1988, author = {Cohen, Jacob}, title = {Statistical Power Analysis for the Behavioral Sciences}, publisher = {Routledge}, year = {1988}, edition = {2nd} } ``` ### Citing in Text ```markdown Data were organized following tidy data principles [@Wickham2014]. Effect sizes were interpreted using Cohen's benchmarks [@Cohen1988]. Multiple citations can be grouped [@Wickham2014; @Cohen1988]. ``` **Renders as**: "Data were organized following tidy data principles (Wickham, 2014). Effect sizes were interpreted using Cohen's benchmarks (Cohen, 1988)." ### Automatic Bibliography At the end of your document: ```markdown # References ::: {#refs} ::: ``` Quarto automatically generates a formatted reference list from all citations in the document. **Getting BibTeX entries**: - Google Scholar: Click "Cite" → "BibTeX" - Zotero: Right-click entry → Export → BibTeX ## Rendering to Different Formats ### PDF Output ```yaml --- title: "My Report" format: pdf: documentclass: article fontsize: 12pt geometry: margin=1in toc: true number-sections: true --- ``` **Requires**: LaTeX installed on your system - Mac: MacTeX - Windows: MiKTeX or TinyTeX - Linux: TeX Live **Quick LaTeX install** (from R): ```r install.packages("tinytex") tinytex::install_tinytex() ``` ### HTML Output ```yaml --- title: "My Report" format: html: toc: true toc-location: left code-fold: true theme: cosmo --- ``` **Advantages**: - No LaTeX required - Interactive elements possible - Easy to share (single HTML file) `code-fold: true` lets readers click to see code if interested. ### Word Output ```yaml --- title: "My Report" format: docx: toc: true number-sections: true reference-doc: custom-template.docx --- ``` **Use when**: Collaborators need to edit in Word **Custom template**: Create a Word document with your preferred styles, save as `custom-template.docx`, and Quarto applies those styles. ## The One-Click Report Workflow The power of Quarto is **full reproducibility**: 1. Someone sends you new data 2. You replace the old CSV with the new one 3. You click "Render" 4. Every table, every figure, every statistic updates automatically **No manual copying. No stale numbers. No version confusion.** This is what reproducible research looks like. ### Example Workflow **Monday**: Analyze 200 songs, render report **Wednesday**: Advisor says "Can you add 50 more songs?" **Thursday**: You code 50 songs, append to CSV **Friday**: Click "Render" → entire report updates with n=250 statistics **Traditional workflow**: Manually update every table, re-export every figure, retype every statistic. Hours of tedious work. High error risk. **Quarto workflow**: Click one button. 30 seconds. Perfect accuracy. ## Advanced Features ### Figure Panels (Subfigures) ```{r} #| label: fig-panels #| fig-cap: "Tempo and Chart Performance by Sentiment" #| fig-subcap: #| - "Boxplot comparison" #| - "Scatterplot with trend lines" #| layout-ncol: 2 # Panel A ggplot(music_data, aes(x = Lyric_Sentiment, y = Tempo)) + geom_boxplot() + theme_minimal() # Panel B ggplot(music_data, aes(x = Tempo, y = Max_Rank, color = Lyric_Sentiment)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_minimal() ``` Creates Figure 1 with panels (a) and (b). ### Cross-References ```markdown As shown in @tbl-descriptives, the mean tempo varied by sentiment. This pattern is visualized in @fig-tempo-chart (panel b). ## Method {#sec-method} Our coding procedure (see @sec-method) ensured high reliability... ``` **Section references**: Use `{#sec-label}` to label sections, then `@sec-label` to reference. ### Conditional Content ```{r} show_detailed_analysis <- FALSE ``` ```{r} #| eval: !expr show_detailed_analysis # This chunk only runs if show_detailed_analysis is TRUE # Useful for supplementary analyses you might want to toggle ``` ## Common Issues and Solutions ### Issue 1: "LaTeX Error" **Symptom**: PDF rendering fails with LaTeX error **Solution**: - Install TinyTeX: `tinytex::install_tinytex()` - Or switch to HTML output temporarily ### Issue 2: "Object not found" **Symptom**: Error says variable doesn't exist **Cause**: Code chunks run in order. If you reference `music_data` before loading it, it fails. **Solution**: Ensure data is loaded in early chunk before any analysis chunks. ### Issue 3: Figures don't appear **Symptom**: Blank space where figure should be **Solutions**: - Check `fig-width` and `fig-height` aren't too large - Ensure `echo: false` is set (if you don't want code shown) - Verify the ggplot code actually produces a plot when run in console ### Issue 4: Citations don't work **Symptom**: `@Author2020` appears as literal text, not formatted citation **Solutions**: - Verify `bibliography: references.bib` in YAML - Check .bib file is in same folder as .qmd - Ensure citation keys match exactly (case-sensitive) ## Practice: Building Your Report ### Exercise 16.1: Basic Document Create a Quarto document with: 1. YAML header (title, author, date, PDF format) 2. Three sections: Introduction, Method, Results 3. One code chunk that loads your music data 4. Render to PDF --- ### Exercise 16.2: Tables and Figures Add to your document: 1. A descriptive statistics table (mean tempo by sentiment) 2. A bar chart showing sentiment distribution 3. Proper captions for both 4. Cross-references in text --- ### Exercise 16.3: Inline Statistics Write a Results paragraph that: 1. Stores a t-test result in a hidden chunk 2. Reports the statistics inline using `` `r variable` `` syntax 3. Includes t-statistic, df, p-value, and means --- ### Exercise 16.4: Citations 1. Create a `references.bib` file with 3 entries 2. Cite them in your Introduction 3. Add a References section 4. Render and verify the bibliography appears correctly --- ## Reflection Questions 1. **Reproducibility Trade-offs**: Quarto requires more upfront learning than Word. Is the reproducibility worth it? For what kinds of projects would you choose Quarto over Word? When would Word be sufficient? 2. **The Automation Question**: When you automate table and figure generation, you lose manual control. What if you want a table formatted differently than the default? How do you balance automation's efficiency with customization needs? 3. **Transparency**: Quarto makes your entire analysis visible and rerunnable. This is good for science but potentially uncomfortable for researchers (it exposes all decisions). Should all research code be shared publicly? Where's the line between transparency and privacy? --- ## Chapter Summary This chapter taught reproducible document creation with Quarto: - **Quarto** integrates code, analysis, and narrative into reproducible documents - **Three components**: YAML metadata, Markdown text, R code chunks - **Advantages over Word**: Reproducibility, version control, automation, multi-format output - **Tables**: Create with `kable()`, auto-numbered with `tbl-cap`, reference with `@tbl-label` - **Figures**: Auto-numbered with `fig-cap`, reference with `@fig-label` - **Inline code**: Embed statistics in text with `` `r code` `` - **Citations**: Manage with BibTeX, cite with `@key`, auto-generate bibliography - **Multiple formats**: Same source → PDF (via LaTeX), HTML, or Word - **One-click reports**: Update entire document when data changes - **Chunk options**: Control visibility (`echo`), execution (`eval`), output (`include`) - **Cross-references**: Link to tables (`@tbl-`), figures (`@fig-`), sections (`@sec-`) --- ## Key Terms - **BibTeX**: Citation format used by Quarto and LaTeX - **Code chunk**: Block of R code embedded in Quarto document - **Cross-reference**: Link to numbered table, figure, or section - **Inline code**: R code embedded within text (`` `r code` ``) - **Knit/Render**: Process of executing code and generating final document - **Markdown**: Lightweight markup language for formatting text - **Quarto**: Scientific publishing system combining code and narrative - **Reproducible document**: Document that can be regenerated from source code - **YAML**: Metadata format at top of Quarto documents (Yet Another Markup Language) --- ## Looking Ahead Chapter 17 (Going Live) concludes the textbook with research dissemination and sharing. You've created a reproducible research report—now you'll learn to share it responsibly. This final chapter covers archiving data and code on the Open Science Framework and GitHub, writing abstracts and plain-language summaries for different audiences, creating online research portfolios, ethical considerations in sharing findings, and understanding open science practices. You'll learn how to make your work discoverable, reproducible, and impactful beyond the classroom.