Experiments and Causal Research Designs

The Quest for “Why”: Beyond Correlation to Causation

For decades, a recurring and often heated debate has swirled around the potential effects of violent media. From comic books in the 1950s to television in the 1980s and video games today, the question remains a potent one: Does exposure to violent content cause aggression in its audience? A researcher using a survey, as we discussed in the previous chapter, could certainly investigate this question. They might design a questionnaire that measures both the amount of time an individual spends playing violent video games and their self-reported levels of aggressive behavior. Suppose the survey of a large, representative sample reveals a strong positive correlation. In that case, that is, people who play more violent games also report higher levels of aggression—the researcher has found an interesting and vital association.

But have they proven that the video games caused the aggression? The answer is no. A survey, in this case, leaves us with a classic chicken-and-egg problem. The correlation could mean that playing violent games leads to aggression. But it is equally plausible that people who are already predisposed to aggression are more drawn to violent video games in the first place. It is also possible that a third, unmeasured variable—such as a stressful home environment, social isolation, or a particular personality trait—is the actual cause of both the gaming habits and the aggressive behavior. A correlation, no matter how strong, cannot by itself untangle these competing explanations. It tells us that two variables are dancing together, but it cannot tell us which one is leading.

To move beyond describing relationships and begin to make credible claims about cause and effect, researchers need a different tool—one specifically designed to answer the “why” question. That tool is the experiment. The experiment is the gold standard for testing causal hypotheses. While other methods can provide suggestive evidence, the unique logic of the experiment, with its emphasis on manipulation and control, provides the most potent framework for isolating a cause and demonstrating its effect. It is a method designed not just to observe the world as it is, but to intervene in it to understand how it works systematically.

This chapter is a deep dive into the logic and practice of experimental research. We will begin by revisiting the three essential criteria that must be met to establish a causal relationship and see how the core components of a true experiment are specifically designed to satisfy them. We will then explore the architecture of common experimental designs, from the foundational pretest-posttest control group design to more complex factorial designs that allow for the investigation of multiple causal factors at once. A central theme of this chapter will be the fundamental trade-off between internal validity (the confidence in our causal claim) and external validity (the generalizability of our findings). Finally, we will consider variations like field experiments and quasi-experiments and address the unique ethical considerations that arise when a researcher’s work involves active intervention in the lives of their participants.

The Logic of Causal Inference

To say that one thing causes another is to make one of the strongest claims a researcher can advance. In the social sciences, we do not make such claims lightly. The logic of science requires that three specific criteria be met before we can confidently infer a causal relationship between an independent variable (the presumed cause) and a dependent variable (the presumed effect).

  1. Temporal Ordering: The cause must precede the effect in time. This is a simple and non-negotiable condition. For violent video games to cause aggression, the act of playing the games must occur before the aggressive behavior is observed.

  2. Association (or Correlation): The two variables must be empirically related; they must co-vary. As one variable changes, the other must also change in a patterned way. If there is no statistical association between video game playing and aggression, then one cannot be the cause of the other.

  3. Nonspuriousness: This is the most difficult criterion to satisfy. A relationship between two variables is spurious when it is not genuine but is instead caused by a third, confounding variable that is related to both the presumed cause and the presumed effect. Our earlier example of a stressful home environment potentially causing both a retreat into video games and aggressive outbursts is an example of a potential spurious relationship. To establish a true causal link, the researcher must be able to rule out any and all plausible rival explanations.

While survey research can easily establish association and, in the case of longitudinal designs, can provide evidence of temporal ordering, it struggles mightily with the criterion of nonspuriousness. A survey researcher can measure and statistically control for known and anticipated confounding variables, but it is impossible to measure and control for all of them. The unique power of the experiment comes from its ability to address the problem of spuriousness head-on through its core design features.

The Core Components of a True Experiment

An actual experiment is defined by three essential components that work in concert to satisfy the criteria for causality: (1) manipulation of the independent variable, (2) random assignment of participants to conditions, and (3) a high degree of control over the research environment.

Manipulation of the Independent Variable

Unlike a survey, where a researcher measures pre-existing characteristics of respondents, an experiment involves the researcher actively doing something to the participants. This is the act of manipulation. The researcher purposefully changes, alters, or influences the independent variable to see what effect this change has on the dependent variable.

To test our video game hypothesis, a researcher would manipulate the independent variable, “exposure to violent video game content.” This is typically done by creating at least two different conditions. The group of participants who receive the manipulation of interest is called the treatment group (or experimental group). In our example, they would be asked to play a violent video game for a set period. The comparison group, which does not receive the manipulation, is called the control group. They might be asked to play a nonviolent video game for the same amount of time, or to engage in some other unrelated activity. By actively creating the difference in the independent variable, the researcher satisfies the criterion of temporal ordering—the exposure to the stimulus (the cause) is guaranteed to happen before the measurement of the outcome (the effect).

Random Assignment: The “Great Equalizer”

Manipulation alone is not enough. If we let participants choose which group they want to be in, we would reintroduce the very problem of self-selection we were trying to solve. The most crucial component of an actual experiment, and the one that gives it its unique causal power, is random assignment.

Random assignment, also called randomization, is the process of assigning participants from the sample to the different experimental conditions based on chance alone. This can be done by flipping a coin, using a random number generator, or any other process that ensures each participant has an equal probability of being placed in any given group. It is essential to distinguish random assignment from random sampling.

Random sampling is a method for selecting a representative sample from a population to enhance external validity (generalizability). Random assignment is a method for placing the participants you already have into different conditions to enhance internal validity (causal inference).

The purpose of random assignment is to create statistically equivalent groups before the manipulation of the independent variable occurs. By using chance to distribute the participants, the researcher ensures that all the myriad individual differences that exist among them—personality, mood, intelligence, background, prior experiences—are, in the long run, distributed evenly across all the groups. This means that, before the treatment is introduced, the treatment group and the control group are, on average, the same on every conceivable variable, both those we can measure and those we cannot. Random assignment is the “great equalizer.” It is the mechanism that allows the researcher to control for all possible confounding variables simultaneously, thereby satisfying the criterion of nonspuriousness. Suppose the groups were equivalent at the start, and the only systematic difference in their experience during the study was the manipulation of the independent variable. In that case, any significant difference observed in the dependent variable at the end of the survey can be confidently attributed to that manipulation.

Control Over the Research Environment

The third component of an actual experiment is the researcher’s ability to exert a high degree of control over the experimental setting. To isolate the effect of the independent variable, the researcher must ensure that everything else in the participants’ experience is held constant across the different conditions. This is why many experiments are conducted in a laboratory, a controlled environment where the researcher can minimize the influence of extraneous variables.

In our video game experiment, the researcher would ensure that participants in both the violent and nonviolent game conditions are in the same type of room, receive the same instructions from the same researcher, play for the same amount of time, and complete the same measure of aggression afterward. By keeping all these other factors equivalent, the researcher eliminates them as potential alternative explanations for the results.

Common Experimental Designs

Experimental designs are the specific blueprints for how these core components are arranged. While many variations exist, a few classic designs form the foundation of most experimental research. These designs are often represented using a standard notation:

  • R = Random assignment of participants to conditions

  • X = The experimental treatment or manipulation (the independent variable)

  • O = An observation or measurement of the dependent variable

Pretest-Posttest Control Group Design

This is one of the most common and powerful experimental designs. It involves measuring the dependent variable both before and after the experimental manipulation.

Notation:

  • Group 1: R O1 X O2
  • Group 2: R O1 O2

Procedure: Participants are randomly assigned to either the treatment group or the control group. Both groups complete a pretest (O1), which is a measure of the dependent variable. The treatment group is then exposed to the manipulation (X), while the control group is not. Finally, both groups complete a posttest (O2), which is the same measure of the dependent variable.

Advantages: This design is very strong. The pretest allows the researcher to verify that the random assignment was successful in creating equivalent groups at the start. It also allows the researcher to measure the precise amount of change in the dependent variable for each group.

Disadvantages: The primary weakness is the potential for pretest sensitization (also called a testing effect). The act of taking the pretest might alert participants to the purpose of the study or make them more sensitive to the experimental manipulation, which could influence their posttest scores in a way that would not happen in the real world. This is a threat to the study’s external validity.

Posttest-Only Control Group Design

To address the problem of pretest sensitization, researchers can use a design that omits the pretest.

Notation:

  • Group 1: R X O1
  • Group 2: R O1

Procedure: Participants are randomly assigned to the treatment or control group. The treatment group is exposed to the manipulation (X). Then, the dependent variable is measured for both groups (O1).

Advantages: This design eliminates the possibility of pretest sensitization. It is also often more efficient and less time-consuming to implement.

Disadvantages: The researcher cannot be certain that the groups were equivalent at the start, although with a sufficiently large sample, random assignment makes this highly probable. The researcher also cannot measure the amount of change, only the final difference between the groups.

Solomon Four-Group Design

This is the most rigorous and complex of the classic designs. It is essentially a combination of the previous two designs, created specifically to test for the presence of pretest effects.

Notation:

  • Group 1: R O1 X O2
  • Group 2: R O1 O2
  • Group 3: R X O2
  • Group 4: R O2

Procedure: Participants are randomly assigned to one of four groups. The first two groups form a standard pretest-posttest control group design. The second two groups form a posttest-only control group design.

Advantages: This design allows the researcher to make several powerful comparisons. By comparing the posttest scores of all four groups, the researcher can determine not only the effect of the treatment but also the effect of the pretest itself, as well as any interaction between the pretest and the treatment.

Disadvantages: The primary drawback is its complexity and the large number of participants required, which makes it costly and challenging to implement in practice.

Factorial Designs

The designs discussed so far have involved a single independent variable. However, communication phenomena are often complex, with multiple factors influencing an outcome. Factorial designs are experimental designs that involve more than one independent variable (or “factor”). This allows researchers to examine not only the separate effect of each independent variable (its main effect) but also how the independent variables work together to influence the dependent variable (their interaction effect).

A simple example is a 2x2 factorial design. Imagine a researcher is interested in the effects of both message source credibility (Factor A, with two levels: high credibility vs. low credibility) and the use of evidence (Factor B, with two levels: statistical evidence vs. narrative evidence) on the persuasiveness of a message. This design would have four unique conditions (2 x 2 = 4):

  1. High Credibility Source / Statistical Evidence

  2. High Credibility Source / Narrative Evidence

  3. Low Credibility Source / Statistical Evidence

  4. Low Credibility Source / Narrative Evidence

Participants would be randomly assigned to one of these four conditions. The analysis would allow the researcher to see if there is a main effect for source credibility (are high-credibility sources more persuasive overall?), a main effect for evidence type (is statistical evidence more persuasive overall?), and, most interestingly, an interaction effect. An interaction might reveal, for example, that statistical evidence is only more persuasive when it comes from a high-credibility source. Factorial designs allow for a more nuanced and realistic examination of the complex causal processes at play in communication.

Validity in Experiments: The Fundamental Trade-Off

The primary reason for conducting an experiment is to achieve a high degree of confidence in our causal conclusions. This confidence is known as internal validity. However, this often comes at a cost to the generalizability of our findings, or their external validity. This trade-off is a central dilemma in experimental research.

Internal Validity: Confidence in Causality

Internal validity refers to the degree to which a research design allows us to conclude that the independent variable, and not some other extraneous or confounding variable, was responsible for the observed change in the dependent variable. A study has high internal validity if it successfully rules out plausible alternative explanations for its findings.

As we have seen, the true experiment, with its use of manipulation, a control group, and especially random assignment, is the research design that provides the highest possible degree of internal validity. It is specifically designed to control for the common threats to internal validity, such as history (external events), maturation (natural changes in participants), selection bias, and so on. By ensuring the groups are equivalent at the start and are treated identically except for the manipulation, the experiment isolates the causal mechanism of interest.

External Validity: The Question of Generalizability

External validity refers to the extent to which the findings of a study can be generalized to other people, settings, and times. A study has high external validity if its results are likely to hold true in the “real world,” outside the specific confines of the research study itself.

It is precisely the features that give the laboratory experiment its high internal validity—its tight control and artificial setting—that often threaten its external validity. Several factors can limit the generalizability of experimental findings:

  • Artificiality of the Setting: The controlled environment of a laboratory is, by definition, not a naturalistic setting. The way people behave when they know they are being observed in a study (a phenomenon known as the
    Hawthorne effect) may be different from how they behave in their everyday lives.

  • Sample Characteristics: Much experimental research in communication, for practical reasons, relies on convenience samples of college students. Findings from a sample of 19-year-old undergraduates may not generalize to the broader population of adults.

  • Forced Exposure: In many media effects experiments, participants are required to view, read, or play with media content that they might never choose to engage with on their own. This “forced exposure” condition is different from the self-selected media environment of the real world, which can limit the applicability of the findings.

This creates a fundamental trade-off. Researchers often must choose whether to prioritize the high internal validity of a controlled lab experiment or the high external validity of a study conducted in a more naturalistic setting. The choice depends on the goals of the research. If the goal is to test a specific theoretical proposition about a causal mechanism, internal validity is paramount. If the goal is to understand how a phenomenon operates in the real world, external validity may be more critical.

Beyond the Lab: Field Experiments and Quasi-Experiments

To address the limitations of the laboratory experiment, researchers can turn to alternative designs that move the research into more naturalistic settings.

A field experiment is an experiment that is conducted in a real-world, natural setting rather than in a laboratory. For example, a researcher might randomly assign different versions of a political campaign flyer to various neighborhoods to see which one is more effective at increasing voter turnout. Field experiments retain the core experimental components of manipulation and random assignment, but because they occur in a natural environment, they tend to have higher external validity than lab experiments. The trade-off is that the researcher has less control over extraneous variables, which can introduce threats to internal validity.

A quasi-experimental design is a research design that has some of the features of an actual experiment but lacks the crucial element of random assignment. Quasi-experiments are often used in applied settings where it is impossible or unethical to assign participants to conditions randomly. For example, a researcher wanting to study the effectiveness of a new teaching method might have to use two pre-existing classrooms, assigning the new process to one and the traditional method to the other. Because the students were not randomly assigned to the classrooms, the groups may not be equivalent at the start, which makes it much harder to rule out alternative explanations for any observed differences in outcomes. Standard quasi-experimental designs include the nonequivalent control group design and the interrupted time-series design. While they provide weaker evidence for causality than actual experiments, they are often the most practical option for conducting research in real-world settings.

Ethical Considerations in Experimental Research

The active and often intrusive nature of experimental research raises a number of important ethical considerations. The principles of respect for persons, beneficence, and justice, as discussed in Chapter 3, are paramount.

One of the most common ethical issues in experimental research is the use of deception. Researchers often need to conceal the true purpose of a study from participants to avoid demand characteristics—where participants guess the hypothesis and alter their behavior to either help or hinder the researcher. While deception can be necessary to ensure the validity of the results, it must be used with caution. It is only considered ethically permissible when the potential scientific value of the research outweighs the risks, and when there is no viable non-deceptive alternative. When deception is used, a thorough debriefing at the end of the study is an absolute requirement. The debriefing must fully explain the deception, answer any questions, and address any psychological distress the study may have caused.

Researchers must also be vigilant about minimizing any potential for harm. An experiment designed to induce fear or anxiety, for example, must include procedures to ensure that participants leave the study in a state of well-being no worse than when they arrived. Finally, when an experimental treatment is potentially beneficial (e.g., a new therapeutic communication technique), the principle of justice raises concerns about withholding that benefit from the control group. A common solution is to use a waiting-list control group, where the control group participants are offered the beneficial treatment after the data collection is complete.

Conclusion: The Power and Precision of the Experiment

The experiment stands as the most powerful and precise tool in the social scientist’s arsenal for investigating questions of cause and effect. Its unique logic, built on the foundational pillars of manipulation, random assignment, and control, provides a rigorous framework for isolating a causal relationship and ruling out the myriad alternative explanations that can plague other research methods. From the elegant simplicity of the posttest-only design to the nuanced complexity of a factorial study, the experiment allows researchers to move beyond mere description to the more ambitious goal of explanation.

This power, however, is not without its limitations. The very control that gives the experiment its internal validity can create an artificiality that threatens the external validity of its findings. The responsible researcher must always be aware of this fundamental trade-off, making conscious decisions about whether to prioritize causal confidence or real-world generalizability. The experiment is not the right tool for every research question, but for those questions that seek to unravel the intricate “why” behind the complex processes of mass communication, its logic is indispensable.

Journal Prompts

  1. Think of a headline or news story you’ve seen that claims one thing causes another (e.g., “Teens who use social media are more likely to be depressed”). Based on what you’ve learned in this chapter, explain why this claim may or may not be valid. What type of research design would be necessary to make such a claim confidently?

  2. Choose a communication-related research question you’re interested in (e.g., “Does political meme exposure influence voting confidence?”). Then briefly describe how you might set up a simple experiment to test that question. What would you manipulate? What would you measure? How would random assignment help strengthen your conclusions?

  3. Experiments often require researchers to deceive participants or control aspects of their environment. Reflect on how you feel about that. Do you think the benefits of experimental knowledge are worth these trade-offs? What would be essential to include in your debrief if you had to deceive participants in your study?