11 Inferential Analysis
11.1 Overview
Descriptive statistics help summarize your dataset. However, they cannot answer questions about whether observed differences or relationships are meaningful beyond the sample. Inferential statistics are used to evaluate whether patterns observed in a sample are likely to generalize to the larger population.
In social science research, inferential statistics are used to: - Compare means between groups (e.g., do anxiety levels differ by gender?) - Evaluate associations between variables (e.g., is time spent gaming associated with social phobia?) - Predict outcomes using one or more variables (e.g., does platform use and gender predict satisfaction with life?)
This chapter introduces the foundational inferential methods for hypothesis testing in media and communication research. These include:
- Chi-Square Test of Independence
- Single Sample T-Test
- Independent Samples T-Test
- One-Way ANOVA
- Two-Way ANOVA
- ANCOVA (Analysis of Covariance)
- Simple and Multiple Linear Regression
- Logistic Regression
Each section includes: - Conceptual explanation - Justification for use - Complete R code (tidyverse-compatible) - Statistical output and APA-style reporting - Notes on assumptions and effect sizes
We use the gaming_anxiety.csv
dataset for all examples. This dataset includes responses from over 13,000 gamers who completed demographic and psychological scales (GAD, SWL, SPIN), and reported on game habits, platforms, and motivations for play.
11.2 Chi-Square Test of Independence
Concept
The Chi-Square Test of Independence evaluates whether two categorical variables are statistically associated. It is used when both the independent and dependent variables are nominal (unordered categories).
Variables
- Gender: Male, Female, Other
- Platform: PC, Console, Mobile
Load the Dataset
gaming_data <- read.csv("https://github.com/SIM-Lab-SIUE/SIM-Lab-SIUE.github.io/raw/refs/heads/main/research-methods/data/data.csv", encoding = "ISO-8859-1")
Code: Chi-Square Test
library(tidyverse)
# Tabulate Gender × Platform
table_chi <- table(gaming_data$Gender, gaming_data$Platform)
# Run chi-square test
chisq.test(table_chi)
11.3 Single Sample T-Test
Concept
The Single Sample T-Test compares the sample mean of one variable to a known or hypothetical population value.
Research Question (RQ)
Is the average anxiety score (GAD_T) in this sample significantly different from the general population mean of 5.0?
Code: Single Sample T-Test
t.test(gaming_data$GAD_T, mu = 5)
Output
One Sample t-test
data: gaming_data$GAD_T
t = 5.2185, df = 13463, p-value = 1.831e-07
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
5.132353 5.291593
sample estimates:
mean of x
5.211973
11.4 Independent Samples T-Test
Concept
A t-test compares the means of two groups to determine whether they are statistically different from each other. It is used when:
- You have one categorical independent variable with two groups (e.g., gender)
- You have one continuous dependent variable (e.g., GAD_T)
Research Question (RQ):
Do anxiety scores differ by gender identity?
Example: GAD_T by Gender
We will use t.test()
to test whether the average GAD score differs between gender groups.
Output
Welch Two Sample t-test
data: GAD_T by Gender
t = -4.5944, df = 51.177, p-value = 2.864e-05
alternative hypothesis: true difference in means between group Male and group Other is not equal to 0
95 percent confidence interval:
-6.490253 -2.543269
sample estimates:
mean in group Male mean in group Other
5.060162 9.576923
Interpretation
- t-value: The test statistic (t = -3.864) reflects the size of the difference relative to the variability in the data.
- p-value: The probability of observing such a difference by chance (p < .001), indicating a statistically significant difference between male and other-identifying participants.
- 95% CI: The true difference in population means is likely between -3.81 and -1.22.
- Group Means: Male gamers report lower GAD scores (M = 5.19) than Other-identifying gamers (M = 7.71).
APA Reporting Style
A Welch’s t-test indicated a significant difference in anxiety scores between male and other-identifying gamers, t(51.18) = -4.59, p < .001, 95% CI [-6.49, -2.54]. Other-identifying participants (M = 9.58) reported significantly higher GAD scores than male participants (M = 5.06).
11.5 One-Way ANOVA
Concept
Analysis of Variance (ANOVA) compares the means of three or more groups. It tests whether any of the group means are different from the others.
Research Question (RQ):
Does gaming platform (PC, Console, Mobile) affect social anxiety scores (SPIN_T)?
Output
Df Sum Sq Mean Sq F value Pr(>F)
Platform 2 1261 630.3 3.477 0.0309 *
Residuals 12811 2322676 181.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
650 observations deleted due to missingness
- F value: The ratio of between-group to within-group variance
- p-value: Indicates whether at least one group differs from the others (p = .030)
Post-Hoc Test: Tukey’s HSD
If the ANOVA is significant, you need to run post-hoc tests to identify which groups differ.
# Tukey post-hoc test
TukeyHSD(anova_model)
Output
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = SPIN_T ~ Platform, data = gaming_data)
$Platform
diff lwr upr p adj
PC-Console (PS, Xbox, ...) -1.643023 -3.833938 0.547893 0.1840353
Smartphone / Tablet-Console (PS, Xbox, ...) 4.164071 -3.057782 11.385925 0.3667104
Smartphone / Tablet-PC
APA Reporting Style
A one-way ANOVA revealed a statistically significant effect of platform on social phobia scores, F(2, 12811) = 3.48, p = .031. Tukey post-hoc comparisons indicated that while smartphone/tablet users scored higher than console and PC users, no pairwise differences reached statistical significance.
11.6 Two-Way ANOVA
Concept
A Two-Way ANOVA tests the effects of two categorical independent variables on a continuous outcome, including whether there is an interaction between them.
Research Question (RQ)
Do anxiety scores differ by gender and platform, and is there an interaction between the two?
Example Output
Df Sum Sq Mean Sq F value Pr(>F)
Gender 2 5341 2670.4 122.414 <2e-16 ***
Platform 2 85 42.7 1.958 0.141
Gender:Platform 4 138 34.4 1.579 0.177
Residuals 13455 293515 21.8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
11.7 ANCOVA: Analysis of Covariance
Concept
ANCOVA examines the effect of a categorical independent variable on a continuous outcome while statistically controlling for another continuous variable (covariate).
Research Question (RQ)
Does platform predict social phobia scores after controlling for hours spent gaming?
Output
Df Sum Sq Mean Sq F value Pr(>F)
Platform 2 1188 594 3.288 0.0373 *
Hours 1 5561 5561 30.772 2.96e-08 ***
Residuals 12782 2309737 181
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
678 observations deleted due to missingness
11.8 Simple Linear Regression
Concept
Simple linear regression models the relationship between a single predictor (independent variable) and a continuous outcome (dependent variable). It estimates how much the outcome changes on average when the predictor increases by one unit.
This technique is foundational in statistical modeling. It is used when: - You have two numeric variables - You want to understand how one variable predicts another - You want to assess direction, magnitude, and significance of an association
Linear regression assumes a linear relationship, normally distributed residuals, and homoscedasticity (equal variance of residuals across levels of the predictor).
Research Question (RQ):
Do the number of hours spent gaming predict satisfaction with life?
Output
Call:
lm(formula = SWL_T ~ Hours, data = gaming_data)
Residuals:
Min 1Q Median 3Q Max
-14.8780 -5.7898 0.1771 6.1404 21.5312
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.8817056 0.0653373 304.293 < 2e-16 ***
Hours -0.0036766 0.0008863 -4.148 3.37e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.22 on 13432 degrees of freedom
(30 observations deleted due to missingness)
Multiple R-squared: 0.001279, Adjusted R-squared: 0.001205
F-statistic: 17.21 on 1 and 13432 DF, p-value: 3.371e-05
- Intercept = 19.93: The predicted SWL_T score for someone who plays 0 hours/week
- Slope (Hours) = -0.0034: For every additional hour of gameplay, life satisfaction decreases by 0.0034 points, on average
11.9 Multiple Linear Regression
Concept
Linear regression estimates the relationship between one outcome and one or more predictors. It helps you predict an outcome variable (e.g., SWL_T) from explanatory variables (e.g., Hours, Gender).
Research Question (RQ):
Do hours and gender predict life satisfaction?
Output (Simplified)
term estimate std.error statistic p.value
(Intercept) 19.037025769 0.2706934594 70.326878 0.0000000000
Hours -0.003168531 0.0008958389 -3.536943 0.0004061515
GenderMale 0.896374344 0.2776521546 3.228408 0.0012478040
GenderOther -3.166372560 1.0572902969 -2.994800 0.0027512578
- Intercept: Predicted SWL_T score when Hours = 0 and Gender = “Male”
- Hours: For each additional hour of gaming, life satisfaction decreases slightly
- GenderOther: Participants identifying as “Other” report significantly lower SWL_T scores than males
APA Reporting Style
A multiple linear regression was conducted to examine whether weekly hours played and gender identity predicted satisfaction with life. The overall model was statistically significant, F(3, 13,430) = 14.36, p < .001, but explained only a small proportion of variance in SWL_T (R² = .003, adjusted R² = .003).
Weekly hours played significantly predicted satisfaction with life, b = –0.0032, t = –3.54, p < .001, suggesting a very small negative relationship. Gender was also a significant predictor: male participants reported significantly higher SWL_T scores than female participants, b = 0.90, t = 3.23, p = .001, while participants identifying as “Other” reported significantly lower SWL_T scores than females, b = –3.17, t = –2.99, p = .003.
11.10 Logistic Regression
Research Question (RQ)
Are hours played and anxiety levels associated with whether someone plays on a PC (vs. other platforms)?
Output
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Call:
glm(formula = is_pc ~ Hours + GAD_T, family = binomial, data = gaming_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.015716 0.136520 29.415 <2e-16 ***
Hours 0.003833 0.004854 0.790 0.430
GAD_T -0.019881 0.013150 -1.512 0.131
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2439.6 on 13433 degrees of freedom
Residual deviance: 2436.9 on 13431 degrees of freedom
(30 observations deleted due to missingness)
AIC: 2442.9
Number of Fisher Scoring iterations: 7
Interpretation
- More hours played = higher odds of being a PC gamer
- Higher anxiety = slightly lower odds of being a PC gamer
- Odds ratios (OR < 1) indicate negative relationships
APA Style
A logistic regression was conducted to predict the likelihood of being a PC gamer based on hours played and anxiety levels. Neither predictor was statistically significant: Hours played, b = 0.0038, p = .43; GAD_T, b = -0.0199, p = .13. The model was not a significant improvement over the null model, χ²(2) = 2.72, p = .257.
11.11 Summary
In this chapter, you learned how to:
- Conduct and interpret a full suite of inferential tests
- Choose tests based on variable type and research question
- Report statistical results using APA 7 format
- Control for covariates and model binary outcomes
This prepares you for real-world research questions that involve group comparisons, associations, predictions, and interactions—all within the framework of social science methodology.