• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Quasi Experimental Design Overview & Examples

By Jim Frost Leave a Comment

What is a Quasi Experimental Design?

A quasi experimental design is a method for identifying causal relationships that does not randomly assign participants to the experimental groups. Instead, researchers use a non-random process. For example, they might use an eligibility cutoff score or preexisting groups to determine who receives the treatment.

Image illustrating a quasi experimental design.

Quasi-experimental research is a design that closely resembles experimental research but is different. The term “quasi” means “resembling,” so you can think of it as a cousin to actual experiments. In these studies, researchers can manipulate an independent variable — that is, they change one factor to see what effect it has. However, unlike true experimental research, participants are not randomly assigned to different groups.

Learn more about Experimental Designs: Definition & Types .

When to Use Quasi-Experimental Design

Researchers typically use a quasi-experimental design because they can’t randomize due to practical or ethical concerns. For example:

  • Practical Constraints : A school interested in testing a new teaching method can only implement it in preexisting classes and cannot randomly assign students.
  • Ethical Concerns : A medical study might not be able to randomly assign participants to a treatment group for an experimental medication when they are already taking a proven drug.

Quasi-experimental designs also come in handy when researchers want to study the effects of naturally occurring events, like policy changes or environmental shifts, where they can’t control who is exposed to the treatment.

Quasi-experimental designs occupy a unique position in the spectrum of research methodologies, sitting between observational studies and true experiments. This middle ground offers a blend of both worlds, addressing some limitations of purely observational studies while navigating the constraints often accompanying true experiments.

A significant advantage of quasi-experimental research over purely observational studies and correlational research is that it addresses the issue of directionality, determining which variable is the cause and which is the effect. In quasi-experiments, an intervention typically occurs during the investigation, and the researchers record outcomes before and after it, increasing the confidence that it causes the observed changes.

However, it’s crucial to recognize its limitations as well. Controlling confounding variables is a larger concern for a quasi-experimental design than a true experiment because it lacks random assignment.

In sum, quasi-experimental designs offer a valuable research approach when random assignment is not feasible, providing a more structured and controlled framework than observational studies while acknowledging and attempting to address potential confounders.

Types of Quasi-Experimental Designs and Examples

Quasi-experimental studies use various methods, depending on the scenario.

Natural Experiments

This design uses naturally occurring events or changes to create the treatment and control groups. Researchers compare outcomes between those whom the event affected and those it did not affect. Analysts use statistical controls to account for confounders that the researchers must also measure.

Natural experiments are related to observational studies, but they allow for a clearer causality inference because the external event or policy change provides both a form of quasi-random group assignment and a definite start date for the intervention.

For example, in a natural experiment utilizing a quasi-experimental design, researchers study the impact of a significant economic policy change on small business growth. The policy is implemented in one state but not in neighboring states. This scenario creates an unplanned experimental setup, where the state with the new policy serves as the treatment group, and the neighboring states act as the control group.

Researchers are primarily interested in small business growth rates but need to record various confounders that can impact growth rates. Hence, they record state economic indicators, investment levels, and employment figures. By recording these metrics across the states, they can include them in the model as covariates and control them statistically. This method allows researchers to estimate differences in small business growth due to the policy itself, separate from the various confounders.

Nonequivalent Groups Design

This method involves matching existing groups that are similar but not identical. Researchers attempt to find groups that are as equivalent as possible, particularly for factors likely to affect the outcome.

For instance, researchers use a nonequivalent groups quasi-experimental design to evaluate the effectiveness of a new teaching method in improving students’ mathematics performance. A school district considering the teaching method is planning the study. Students are already divided into schools, preventing random assignment.

The researchers matched two schools with similar demographics, baseline academic performance, and resources. The school using the traditional methodology is the control, while the other uses the new approach. Researchers are evaluating differences in educational outcomes between the two methods.

They perform a pretest to identify differences between the schools that might affect the outcome and include them as covariates to control for confounding. They also record outcomes before and after the intervention to have a larger context for the changes they observe.

Regression Discontinuity

This process assigns subjects to a treatment or control group based on a predetermined cutoff point (e.g., a test score). The analysis primarily focuses on participants near the cutoff point, as they are likely similar except for the treatment received. By comparing participants just above and below the cutoff, the design controls for confounders that vary smoothly around the cutoff.

For example, in a regression discontinuity quasi-experimental design focusing on a new medical treatment for depression, researchers use depression scores as the cutoff point. Individuals with depression scores just above a certain threshold are assigned to receive the latest treatment, while those just below the threshold do not receive it. This method creates two closely matched groups: one that barely qualifies for treatment and one that barely misses out.

By comparing the mental health outcomes of these two groups over time, researchers can assess the effectiveness of the new treatment. The assumption is that the only significant difference between the groups is whether they received the treatment, thereby isolating its impact on depression outcomes.

Controlling Confounders in a Quasi-Experimental Design

Accounting for confounding variables is a challenging but essential task for a quasi-experimental design.

In a true experiment, the random assignment process equalizes confounders across the groups to nullify their overall effect. It’s the gold standard because it works on all confounders, known and unknown.

Unfortunately, the lack of random assignment can allow differences between the groups to exist before the intervention. These confounding factors might ultimately explain the results rather than the intervention.

Consequently, researchers must use other methods to equalize the groups roughly using matching and cutoff values or statistically adjust for preexisting differences they measure to reduce the impact of confounders.

A key strength of quasi-experiments is their frequent use of “pre-post testing.” This approach involves conducting initial tests before collecting data to check for preexisting differences between groups that could impact the study’s outcome. By identifying these variables early on and including them as covariates, researchers can more effectively control potential confounders in their statistical analysis.

Additionally, researchers frequently track outcomes before and after the intervention to better understand the context for changes they observe.

Statisticians consider these methods to be less effective than randomization. Hence, quasi-experiments fall somewhere in the middle when it comes to internal validity , or how well the study can identify causal relationships versus mere correlation . They’re more conclusive than correlational studies but not as solid as true experiments.

In conclusion, quasi-experimental designs offer researchers a versatile and practical approach when random assignment is not feasible. This methodology bridges the gap between controlled experiments and observational studies, providing a valuable tool for investigating cause-and-effect relationships in real-world settings. Researchers can address ethical and logistical constraints by understanding and leveraging the different types of quasi-experimental designs while still obtaining insightful and meaningful results.

Cook, T. D., & Campbell, D. T. (1979).  Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin

Share this:

confounding variables in quasi experiments

Reader Interactions

Comments and questions cancel reply.

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7.3 Quasi-Experimental Research

Learning objectives.

  • Explain what quasi-experimental research is and distinguish it clearly from both experimental and correlational research.
  • Describe three different types of quasi-experimental research designs (nonequivalent groups, pretest-posttest, and interrupted time series) and identify examples of each one.

The prefix quasi means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem. But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones here.

Nonequivalent Groups Design

Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A nonequivalent groups design , then, is a between-subjects design in which participants have not been randomly assigned to conditions.

Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments, might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.

Of course, researchers using a nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.

Pretest-Posttest Design

In a pretest-posttest design , the dependent variable is measured once before the treatment is implemented and once after it is implemented. Imagine, for example, a researcher who is interested in the effectiveness of an antidrug education program on elementary school students’ attitudes toward illegal drugs. The researcher could measure the attitudes of students at a particular elementary school during one week, implement the antidrug program during the next week, and finally, measure their attitudes again the following week. The pretest-posttest design is much like a within-subjects experiment in which each participant is tested first under the control condition and then under the treatment condition. It is unlike a within-subjects experiment, however, in that the order of conditions is not counterbalanced because it typically is not possible for a participant to be tested in the treatment condition first and then in an “untreated” control condition.

If the average posttest score is better than the average pretest score, then it makes sense to conclude that the treatment might be responsible for the improvement. Unfortunately, one often cannot conclude this with a high degree of certainty because there may be other explanations for why the posttest scores are better. One category of alternative explanations goes under the name of history . Other things might have happened between the pretest and the posttest. Perhaps an antidrug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it. Another category of alternative explanations goes under the name of maturation . Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a yearlong program, participants might become less impulsive or better reasoners and this might be responsible for the change.

Another alternative explanation for a change in the dependent variable in a pretest-posttest design is regression to the mean . This refers to the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study because of their extreme scores. Imagine, for example, that only students who scored especially low on a test of fractions are given a special training program and then retested. Regression to the mean all but guarantees that their scores will be higher even if the training program has no effect. A closely related concept—and an extremely important one in psychological research—is spontaneous remission . This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001). Thus one must generally be very cautious about inferring causality from pretest-posttest designs.

Does Psychotherapy Work?

Early studies on the effectiveness of psychotherapy tended to use pretest-posttest designs. In a classic 1952 article, researcher Hans Eysenck summarized the results of 24 such studies showing that about two thirds of patients improved between the pretest and the posttest (Eysenck, 1952). But Eysenck also compared these results with archival data from state hospital and insurance company records showing that similar patients recovered at about the same rate without receiving psychotherapy. This suggested to Eysenck that the improvement that patients showed in the pretest-posttest studies might be no more than spontaneous remission. Note that Eysenck did not conclude that psychotherapy was ineffective. He merely concluded that there was no evidence that it was, and he wrote of “the necessity of properly planned and executed experimental studies into this important field” (p. 323). You can read the entire article here:

http://psychclassics.yorku.ca/Eysenck/psychotherapy.htm

Fortunately, many other researchers took up Eysenck’s challenge, and by 1980 hundreds of experiments had been conducted in which participants were randomly assigned to treatment and control conditions, and the results were summarized in a classic book by Mary Lee Smith, Gene Glass, and Thomas Miller (Smith, Glass, & Miller, 1980). They found that overall psychotherapy was quite effective, with about 80% of treatment participants improving more than the average control participant. Subsequent research has focused more on the conditions under which different types of psychotherapy are more or less effective.

Han Eysenck

In a classic 1952 article, researcher Hans Eysenck pointed out the shortcomings of the simple pretest-posttest design for evaluating the effectiveness of psychotherapy.

Wikimedia Commons – CC BY-SA 3.0.

Interrupted Time Series Design

A variant of the pretest-posttest design is the interrupted time-series design . A time series is a set of measurements taken at intervals over a period of time. For example, a manufacturing company might measure its workers’ productivity each week for a year. In an interrupted time series-design, a time series like this is “interrupted” by a treatment. In one classic example, the treatment was the reduction of the work shifts in a factory from 10 hours to 8 hours (Cook & Campbell, 1979). Because productivity increased rather quickly after the shortening of the work shifts, and because it remained elevated for many months afterward, the researcher concluded that the shortening of the shifts caused the increase in productivity. Notice that the interrupted time-series design is like a pretest-posttest design in that it includes measurements of the dependent variable both before and after the treatment. It is unlike the pretest-posttest design, however, in that it includes multiple pretest and posttest measurements.

Figure 7.5 “A Hypothetical Interrupted Time-Series Design” shows data from a hypothetical interrupted time-series study. The dependent variable is the number of student absences per week in a research methods course. The treatment is that the instructor begins publicly taking attendance each day so that students know that the instructor is aware of who is present and who is absent. The top panel of Figure 7.5 “A Hypothetical Interrupted Time-Series Design” shows how the data might look if this treatment worked. There is a consistently high number of absences before the treatment, and there is an immediate and sustained drop in absences after the treatment. The bottom panel of Figure 7.5 “A Hypothetical Interrupted Time-Series Design” shows how the data might look if this treatment did not work. On average, the number of absences after the treatment is about the same as the number before. This figure also illustrates an advantage of the interrupted time-series design over a simpler pretest-posttest design. If there had been only one measurement of absences before the treatment at Week 7 and one afterward at Week 8, then it would have looked as though the treatment were responsible for the reduction. The multiple measurements both before and after the treatment suggest that the reduction between Weeks 7 and 8 is nothing more than normal week-to-week variation.

Figure 7.5 A Hypothetical Interrupted Time-Series Design

A Hypothetical Interrupted Time-Series Design - The top panel shows data that suggest that the treatment caused a reduction in absences. The bottom panel shows data that suggest that it did not

The top panel shows data that suggest that the treatment caused a reduction in absences. The bottom panel shows data that suggest that it did not.

Combination Designs

A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a control group that is given a pretest, does not receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve but whether they improve more than participants who do not receive the treatment.

Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an antidrug program, and finally are given a posttest. Students in a similar school are given the pretest, not exposed to an antidrug program, and finally are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.

Finally, if participants in this kind of design are randomly assigned to conditions, it becomes a true experiment rather than a quasi experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.

Key Takeaways

  • Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or orders of conditions. Among the important types are nonequivalent groups designs, pretest-posttest, and interrupted time-series designs.
  • Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. It does not eliminate the problem of confounding variables, however, because it does not involve random assignment to conditions. For these reasons, quasi-experimental research is generally higher in internal validity than correlational studies but lower than true experiments.
  • Practice: Imagine that two college professors decide to test the effect of giving daily quizzes on student performance in a statistics course. They decide that Professor A will give quizzes but Professor B will not. They will then compare the performance of students in their two sections on a common final exam. List five other variables that might differ between the two sections that could affect the results.

Discussion: Imagine that a group of obese children is recruited for a study in which their weight is measured, then they participate for 3 months in a program that encourages them to be more active, and finally their weight is measured again. Explain how each of the following might affect the results:

  • regression to the mean
  • spontaneous remission

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin.

Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16 , 319–324.

Posternak, M. A., & Miller, I. (2001). Untreated short-term course of major depression: A meta-analysis of studies using outcomes from studies using wait-list control groups. Journal of Affective Disorders, 66 , 139–146.

Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy . Baltimore, MD: Johns Hopkins University Press.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 14 January 2021

Quantifying causality in data science with quasi-experiments

  • Tony Liu 1 ,
  • Lyle Ungar 1 &
  • Konrad Kording   ORCID: orcid.org/0000-0001-8408-4499 2 , 3  

Nature Computational Science volume  1 ,  pages 24–32 ( 2021 ) Cite this article

13k Accesses

20 Citations

44 Altmetric

Metrics details

  • Computational science
  • Computer science

Estimating causality from observational data is essential in many data science questions but can be a challenging task. Here we review approaches to causality that are popular in econometrics and that exploit (quasi) random variation in existing data, called quasi-experiments, and show how they can be combined with machine learning to answer causal questions within typical data science settings. We also highlight how data scientists can help advance these methods to bring causal estimation to high-dimensional data from medicine, industry and society.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

92,52 € per year

only 7,71 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

confounding variables in quasi experiments

Similar content being viewed by others

confounding variables in quasi experiments

Stable learning establishes some common ground between causal inference and machine learning

confounding variables in quasi experiments

Raiders of the lost HARK: a reproducible inference framework for big data science

confounding variables in quasi experiments

Simple nested Bayesian hypothesis testing for meta-analysis, Cox, Poisson and logistic regression models

Code availability.

We provide interactive widgets of Figs. 2 – 4 in a Jupyter Notebook hosted in a public GitHub repository ( https://github.com/tliu526/causal-data-science-perspective ) and served through Binder (see link in the GitHub repository).

van Dyk, D. et al. ASA statement on the role of statistics in data science. Amstat News https://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/ (2015).

Pearl, J. The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62 , 54–60 (2019).

Article   Google Scholar  

Hernán, M. A., Hsu, J. & Healy, B. Data science is science’s second chance to get causal inference right: a classification of data science tasks. Chance 32 , 42–49 (2019).

Caruana, R. et al. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1721–1730 (ACM Press, 2015); https://doi.org/10.1145/2783258.2788613

Finkelstein, A. et al. The Oregon health insurance experiment: evidence from the first year. Q. J. Econ. 127 , 1057–1106 (2012).

Forney, A., Pearl, J. & Bareinboim, E. Counterfactual data-fusion for online reinforcement learners. In International Conference on Machine Learning (eds. Precup, D. & Teh, Y. W.) 1156–1164 (PMLR, 2017).

Thomas, P. S. & Brunskill, E. Data-efficient off-policy policy evaluation for reinforcement learning. International Conference on Machine Learning (eds. Balcan, M. F. & Weinberger, K.) 2139–2148 (PMLR, 2016).

Athey, S. & Wager, S. Policy learning with observational data. Econometrica (in the press).

Angrist, J. D. & Pischke, J.-S. Mostly Harmless Econometrics: An Empiricist’s Companion (Princeton Univ. Press, 2008).

Imbens, G. & Rubin, D. B. Causal Inference: For Statistics, Social and Biomedical Sciences: An Introduction (Cambridge Univ. Press 2015).

Pearl, J. Causality (Cambridge Univ. Press, 2009).

Hernán, M. A. & Robins, J. M. Causal Inference: What If (Chapman & Hall/CRC, 2020).

Pearl, J. Causal inference in statistics: an overview. Stat. Surv. 3 , 96–146 (2009).

Article   MathSciNet   MATH   Google Scholar  

Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).

Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70 , 41–55 (1983).

Chernozhukov, V. et al. Double/debiased machine learning for treatment and structural parameters. Econ. J. 21 , C1–C68 (2018).

MathSciNet   Google Scholar  

Spirtes, P., Glymour, C. N. & Scheines, R. Causation, Prediction, and Search (MIT Press, 2000).

Schölkopf, B. Causality for machine learning. Preprint at https://arxiv.org/abs/1911.10500 (2019).

Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J. & Schölkopf, B. Distinguishing cause from effect using observational data: methods and benchmarks. J. Mach. Learn. Res. 17 , 1103–1204 (2016).

MathSciNet   MATH   Google Scholar  

Huang, B. et al. Causal discovery from heterogeneous/nonstationary data. J. Mach. Learn. Res. 21 , 1–53 (2020).

Wang, Y. & Blei, D. M. The blessings of multiple causes. J. Am. Stat. Assoc. 114 , 1574–1596 (2019).

Leamer, E. E. Let’s take the con out of econometrics. Am. Econ. Rev. 73 , 31–43 (1983).

Google Scholar  

Angrist, J. D. & Pischke, J.-S. The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. J. Econ. Perspect. 24 , 3–30 (2010).

Angrist, J. D. & Krueger, A. B. Instrumental variables and the search for identification: from supply and demand to natural experiments. J. Econ. Perspect. 15 , 69–85 (2001).

Angrist, J. D. & Krueger, A. B. Does compulsory school attendance affect schooling and earnings? Q. J. Econ. 106 , 979–1014 (1991).

Wooldridge, J. M. Econometric Analysis of Cross Section and Panel Data (MIT Press, 2010).

Angrist, J. D., Imbens, G. W. & Krueger, A. B. Jackknife instrumental variables estimation. J. Appl. Econom. 14 , 57–67 (1999).

Newhouse, J. P. & McClellan, M. Econometrics in outcomes research: the use of instrumental variables. Annu. Rev. Public Health 19 , 17–34 (1998).

Imbens, G. Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics Working Paper No. 26104 (NBER, 2019); https://doi.org/10.3386/w26104

Hanandita, W. & Tampubolon, G. Does poverty reduce mental health? An instrumental variable analysis. Soc. Sci. Med. 113 , 59–67 (2014).

Angrist, J. D., Graddy, K. & Imbens, G. W. The interpretation of instrumental variables estimators in simultaneous equations models with an application to the demand for fish. Rev. Econ. Stud. 67 , 499–527 (2000).

Article   MATH   Google Scholar  

Thistlethwaite, D. L. & Campbell, D. T. Regression-discontinuity analysis: an alternative to the ex post facto experiment. J. Educ. Psychol. 51 , 309–317 (1960).

Fine, M. J. et al. A prediction rule to identify low-risk patients with community-acquired pneumonia. N. Engl. J. Med. 336 , 243–250 (1997).

Lee, D. S. & Lemieux, T. Regression discontinuity designs in economics. J. Econ. Lit. 48 , 281–355 (2010).

Cattaneo, M. D., Idrobo, N. & Titiunik, R. A Practical Introduction to Regression Discontinuity Designs (Cambridge Univ. Press, 2019).

Imbens, G. & Kalyanaraman, K. Optimal Bandwidth Choice for the Regression Discontinuity Estimator Working Paper No. 14726 (NBER, 2009); https://doi.org/10.3386/w14726

Calonico, S., Cattaneo, M. D. & Titiunik, R. Robust data-driven inference in the regression-discontinuity design. Stata J. 14 , 909–946 (2014).

McCrary, J. Manipulation of the running variable in the regression discontinuity design: a density test. J. Econ. 142 , 698–714 (2008).

Imbens, G. & Lemieux, T. Regression discontinuity designs: a guide to practice. J. Economet. 142 , 615–635 (2008).

NCI funding policy for RPG awards. NIH: National Cancer Institute https://deainfo.nci.nih.gov/grantspolicies/finalfundltr.htm (2020).

NIAID paylines. NIH: National Institute of Allergy and Infectious Diseases http://www.niaid.nih.gov/grants-contracts/niaid-paylines (2020).

Keele, L. J. & Titiunik, R. Geographic boundaries as regression discontinuities. Polit. Anal. 23 , 127–155 (2015).

Card, D. & Krueger, A. B. Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania Working Paper No. 4509 (NBER, 1993); https://doi.org/10.3386/w4509

Ashenfelter, O. & Card, D. Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs Working Paper No. 1489 (NBER, 1984); https://doi.org/10.3386/w1489

Angrist, J. D. & Krueger, A. B. in Handbook of Labor Economics Vol. 3 (eds. Ashenfelter, O. C. & Card, D.) 1277–1366 (Elsevier, 1999).

Athey, S. & Imbens, G. W. Identification and inference in nonlinear difference-in-differences models. Econometrica 74 , 431–497 (2006).

Abadie, A. Semiparametric difference-in-differences estimators. Rev. Econ. Stud. 72 , 1–19 (2005).

Lu, C., Nie, X. & Wager, S. Robust nonparametric difference-in-differences estimation. Preprint at https://arxiv.org/abs/1905.11622 (2019).

Besley, T. & Case, A. Unnatural experiments? estimating the incidence of endogenous policies. Econ. J. 110 , 672–694 (2000).

Nunn, N. & Qian, N. US food aid and civil conflict. Am. Econ. Rev. 104 , 1630–1666 (2014).

Christian, P. & Barrett, C. B. Revisiting the Effect of Food Aid on Conflict: A Methodological Caution (The World Bank, 2017); https://doi.org/10.1596/1813-9450-8171 .

Angrist, J. & Imbens, G. Identification and Estimation of Local Average Treatment Effects Technical Working Paper No. 118 (NBER, 1995); https://doi.org/10.3386/t0118

Hahn, J., Todd, P. & Van der Klaauw, W. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica 69 , 201–209 (2001).

Angrist, J. & Rokkanen, M. Wanna Get Away? RD Identification Away from the Cutoff Working Paper No. 18662 (NBER, 2012); https://doi.org/10.3386/w18662

Rothwell, P. M. External validity of randomised controlled trials: “To whom do the results of this trial apply?”. The Lancet 365 , 82–93 (2005).

Rubin, D. B. For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 , 808–840 (2008).

Chaney, A. J. B., Stewart, B. M. & Engelhardt, B. E. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proc. 12th ACM Conference on Recommender Systems 224–232 (Association for Computing Machinery, 2018); https://doi.org/10.1145/3240323.3240370 .

Sharma, A., Hofman, J. M. & Watts, D. J. Estimating the causal impact of recommendation systems from observational data. In Proc. Sixteenth ACM Conference on Economics and Computation 453–470 (Association for Computing Machinery, 2015); https://doi.org/10.1145/2764468.2764488

Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N. & Smith, G. D. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27 , 1133–1163 (2008).

Article   MathSciNet   Google Scholar  

Zhao, Q., Chen, Y., Wang, J. & Small, D. S. Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization. Int. J. Epidemiol. 48 , 1478–1492 (2019).

Moscoe, E., Bor, J. & Bärnighausen, T. Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: a review of current and best practice. J. Clin. Epidemiol. 68 , 132–143 (2015).

Blake, T., Nosko, C. & Tadelis, S. Consumer heterogeneity and paid search effectiveness: a large-scale field experiment. Econometrica 83 , 155–174 (2015).

Dimick, J. B. & Ryan, A. M. Methods for evaluating changes in health care policy: the difference-in-differences approach. JAMA 312 , 2401–2402 (2014).

Kallus, N., Puli, A. M. & Shalit, U. Removing hidden confounding by experimental grounding. Adv. Neural Inf. Process. Syst. 31 , 10888–10897 (2018).

Zhang, J. & Bareinboim, E. Markov Decision Processes with Unobserved Confounders: A Causal Approach. Technical Report (R-23) (Columbia CausalAI Laboratory, 2016).

Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518 , 529–533 (2015).

Lansdell, B., Triantafillou, S. & Kording, K. Rarely-switching linear bandits: optimization of causal effects for the real world. Preprint at https://arxiv.org/abs/1905.13121 (2019).

Adadi, A. & Berrada, M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6 , 52138–52160 (2018).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. 39 , 272–281 (2021).

Moraffah, R., Karami, M., Guo, R., Raglin, A. & Liu, H. Causal interpretability for machine learning—problems, methods and evaluation. ACM SIGKDD Explor. Newsl. 22 , 18–33 (2020).

Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, 2016); https://doi.org/10.1145/2939672.2939778

Mothilal, R. K., Sharma, A. & Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 607–617 (Association for Computing Machinery, 2020); https://doi.org/10.1145/3351095.3372850

Hooker, G. & Mentch, L. Please stop permuting features: an explanation and alternatives. Preprint at https://arxiv.org/abs/1905.03151 (2019).

Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach. J. Econ. Perspect. 31 , 87–106 (2017).

Belloni, A., Chen, D., Chernozhukov, V. & Hansen, C. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 , 2369–2429 (2012).

Singh, R., Sahani, M. & Gretton, A. Kernel instrumental variable regression. Adv. Neural Inf. Process. Syst. 32 , 4593–4605 (2019).

Hartford, J., Lewis, G., Leyton-Brown, K. & Taddy, M. Deep IV: a flexible approach for counterfactual prediction. In Proc. 34th International Conference on Machine Learning Vol. 70 (eds. Precup, D. & Teh Y. W.) 1414–1423 (JMLR.org, 2017).

Athey, S., Bayati, M., Doudchenko, N., Imbens, G. & Khosravi, K. Matrix Completion Methods for Causal Panel Data Models Working Paper No. 25132 (NBER, 2018); https://doi.org/10.3386/w25132

Athey, S., Bayati, M., Imbens, G. & Qu, Z. Ensemble methods for causal effects in panel data settings. AEA Pap. Proc. 109 , 65–70 (2019).

Kennedy, E. H., Balakrishnan, S. & G’Sell, M. Sharp instruments for classifying compliers and generalizing causal effects. Ann. Stat. 48 , 2008–2030 (2020).

Kallus, N. Classifying treatment responders under causal effect monotonicity. In Proc. 36th International Conference on Machine Learning Vol. 97 (eds. Chaudhuri, K. & Salakhutdniov, R.) 3201–3210 (PMLR, 2019).

Li, A. & Pearl, J. Unit selection based on counterfactual logic. In Proc. Twenty-Eighth International Joint Conference on Artificial Intelligence (ed. Kraus, S.) 1793–1799 (International Joint Conferences on Artificial Intelligence Organization, 2019); https://doi.org/10.24963/ijcai.2019/248

Dong, Y. & Lewbel, A. Identifying the effect of changing the policy threshold in regression discontinuity models. Rev. Econ. Stat. 97 , 1081–1092 (2015).

Marinescu, I. E., Triantafillou, S. & Kording, K. Regression discontinuity threshold optimization. SSRN https://doi.org/10.2139/ssrn.3333334 (2019).

Varian, H. R. Big data: new tricks for econometrics. J. Econ. Perspect. 28 , 3–28 (2014).

Athey, S. & Imbens, G. W. Machine learning methods that economists should know about. Annu. Rev. Econ. 11 , 685–725 (2019).

Hudgens, M. G. & Halloran, M. E. Toward causal inference with interference. J. Am. Stat. Assoc. 103 , 832–842 (2008).

Graham, B. & de Paula, A. The Econometric Analysis of Network Data (Elsevier, 2019).

Varian, H. R. Causal inference in economics and marketing. Proc. Natl. Acad. Sci. USA 113 , 7310–7315 (2016).

Marinescu, I. E., Lawlor, P. N. & Kording, K. P. Quasi-experimental causality in neuroscience and behavioural research. Nat. Hum. Behav. 2 , 891–898 (2018).

Abadie, A. & Cattaneo, M. D. Econometric methods for program evaluation. Annu. Rev. Econ. 10 , 465–503 (2018).

Huang, A. & Levinson, D. The effects of daylight saving time on vehicle crashes in Minnesota. J. Safety Res. 41 , 513–520 (2010).

Lepperød, M. E., Stöber, T., Hafting, T., Fyhn, M. & Kording, K. P. Inferring causal connectivity from pairwise recordings and optogenetics. Preprint at bioRxiv https://doi.org/10.1101/463760 (2018).

Bor, J., Moscoe, E., Mutevedzi, P., Newell, M.-L. & Bärnighausen, T. Regression discontinuity designs in epidemiology. Epidemiol. Camb. Mass 25 , 729–737 (2014).

Chen, Y., Ebenstein, A., Greenstone, M. & Li, H. Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy. Proc. Natl. Acad. Sci. USA 110 , 12936–12941 (2013).

Lansdell, B. J. & Kording, K. P. Spiking allows neurons to estimate their causal effect. Preprint at bioRxiv https://doi.org/10.1101/253351 (2019).

Patel, M. S. et al. Association of the 2011 ACGME resident duty hour reforms with mortality and readmissions among hospitalized medicare patients. JAMA 312 , 2364–2373 (2014).

Rishika, R., Kumar, A., Janakiraman, R. & Bezawada, R. The effect of customers’ social media participation on customer visit frequency and profitability: an empirical investigation. Inf. Syst. Res. 24 , 108–127 (2012).

Butsic, V., Lewis, D. J., Radeloff, V. C., Baumann, M. & Kuemmerle, T. Quasi-experimental methods enable stronger inferences from observational data in ecology. Basic Appl. Ecol. 19 , 1–10 (2017).

Download references

Acknowledgements

We thank R. Ladhania and B. Lansdell for their comments and suggestions on this work. We acknowledge support from National Institutes of Health grant R01-EB028162. T.L. is supported by National Institute of Mental Health grant R01-MH111610.

Author information

Authors and affiliations.

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA

Tony Liu & Lyle Ungar

Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA

Konrad Kording

Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

T.L. helped write and prepare the manuscript. L.U. and K.K. jointly supervised this work and helped write the manuscript. All authors discussed the structure and direction of the manuscript throughout its development.

Corresponding author

Correspondence to Konrad Kording .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Fernando Chirigati was the primary editor on this Perspective and managed its editorial process and peer review in collaboration with the rest of the editorial team. Nature Computational Science thanks Jesper Tegnér and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Liu, T., Ungar, L. & Kording, K. Quantifying causality in data science with quasi-experiments. Nat Comput Sci 1 , 24–32 (2021). https://doi.org/10.1038/s43588-020-00005-8

Download citation

Received : 14 August 2020

Accepted : 30 November 2020

Published : 14 January 2021

Issue Date : January 2021

DOI : https://doi.org/10.1038/s43588-020-00005-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Gendered beliefs about mathematics ability transmit across generations through children’s peers.

Nature Human Behaviour (2022)

Integrating explanation and prediction in computational social science

  • Jake M. Hofman
  • Duncan J. Watts
  • Tal Yarkoni

Nature (2021)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

confounding variables in quasi experiments

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Quasi-Experimental Research

The prefix  quasi  means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Recall with a true between-groups experiment, random assignment to conditions is used to ensure the groups are equivalent and with a true within-subjects design counterbalancing is used to guard against order effects. Quasi-experiments are missing one of these safeguards. Although an independent variable is manipulated, either a control group is missing or participants are not randomly assigned to conditions (Cook & Campbell, 1979) [1] .

Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem associated with non-experimental research. But because either counterbalancing techniques are not used or participants are not randomly assigned to conditions—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between non-experimental studies and true experiments.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones in this chapter. 

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin. ↵

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Quasi-Experimental Design | Definition, Types & Examples

Quasi-Experimental Design | Definition, Types & Examples

Published on 11 April 2022 by Lauren Thomas . Revised on 22 January 2024.

Like a true experiment , a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable .

However, unlike a true experiment, a quasi-experiment does not rely on random assignment . Instead, subjects are assigned to groups based on non-random criteria.

Quasi-experimental design is a useful tool in situations where true experiments cannot be used for ethical or practical reasons.

Quasi-experimental design vs experimental design

Table of contents

Differences between quasi-experiments and true experiments, types of quasi-experimental designs, when to use quasi-experimental design, advantages and disadvantages, frequently asked questions about quasi-experimental design.

There are several common differences between true and quasi-experimental designs.

True experimental design Quasi-experimental design
Assignment to treatment The researcher subjects to control and treatment groups. Some other, method is used to assign subjects to groups.
Control over treatment The researcher usually . The researcher often , but instead studies pre-existing groups that received different treatments after the fact.
Use of Requires the use of . Control groups are not required (although they are commonly used).

Example of a true experiment vs a quasi-experiment

However, for ethical reasons, the directors of the mental health clinic may not give you permission to randomly assign their patients to treatments. In this case, you cannot run a true experiment.

Instead, you can use a quasi-experimental design.

You can use these pre-existing groups to study the symptom progression of the patients treated with the new therapy versus those receiving the standard course of treatment.

Prevent plagiarism, run a free check.

Many types of quasi-experimental designs exist. Here we explain three of the most common types: nonequivalent groups design, regression discontinuity, and natural experiments.

Nonequivalent groups design

In nonequivalent group design, the researcher chooses existing groups that appear similar, but where only one of the groups experiences the treatment.

In a true experiment with random assignment , the control and treatment groups are considered equivalent in every way other than the treatment. But in a quasi-experiment where the groups are not random, they may differ in other ways – they are nonequivalent groups .

When using this kind of design, researchers try to account for any confounding variables by controlling for them in their analysis or by choosing groups that are as similar as possible.

This is the most common type of quasi-experimental design.

Regression discontinuity

Many potential treatments that researchers wish to study are designed around an essentially arbitrary cutoff, where those above the threshold receive the treatment and those below it do not.

Near this threshold, the differences between the two groups are often so minimal as to be nearly nonexistent. Therefore, researchers can use individuals just below the threshold as a control group and those just above as a treatment group.

However, since the exact cutoff score is arbitrary, the students near the threshold – those who just barely pass the exam and those who fail by a very small margin – tend to be very similar, with the small differences in their scores mostly due to random chance. You can therefore conclude that any outcome differences must come from the school they attended.

Natural experiments

In both laboratory and field experiments, researchers normally control which group the subjects are assigned to. In a natural experiment, an external event or situation (‘nature’) results in the random or random-like assignment of subjects to the treatment group.

Even though some use random assignments, natural experiments are not considered to be true experiments because they are observational in nature.

Although the researchers have no control over the independent variable, they can exploit this event after the fact to study the effect of the treatment.

However, as they could not afford to cover everyone who they deemed eligible for the program, they instead allocated spots in the program based on a random lottery.

Although true experiments have higher internal validity , you might choose to use a quasi-experimental design for ethical or practical reasons.

Sometimes it would be unethical to provide or withhold a treatment on a random basis, so a true experiment is not feasible. In this case, a quasi-experiment can allow you to study the same causal relationship without the ethical issues.

The Oregon Health Study is a good example. It would be unethical to randomly provide some people with health insurance but purposely prevent others from receiving it solely for the purposes of research.

However, since the Oregon government faced financial constraints and decided to provide health insurance via lottery, studying this event after the fact is a much more ethical approach to studying the same problem.

True experimental design may be infeasible to implement or simply too expensive, particularly for researchers without access to large funding streams.

At other times, too much work is involved in recruiting and properly designing an experimental intervention for an adequate number of subjects to justify a true experiment.

In either case, quasi-experimental designs allow you to study the question by taking advantage of data that has previously been paid for or collected by others (often the government).

Quasi-experimental designs have various pros and cons compared to other types of studies.

  • Higher external validity than most true experiments, because they often involve real-world interventions instead of artificial laboratory settings.
  • Higher internal validity than other non-experimental types of research, because they allow you to better control for confounding variables than other types of studies do.
  • Lower internal validity than true experiments – without randomisation, it can be difficult to verify that all confounding variables have been accounted for.
  • The use of retrospective data that has already been collected for other purposes can be inaccurate, incomplete or difficult to access.

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference between this and a true experiment is that the groups are not randomly assigned.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomisation. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity  as they can use real-world interventions instead of artificial laboratory settings.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Thomas, L. (2024, January 22). Quasi-Experimental Design | Definition, Types & Examples. Scribbr. Retrieved 30 July 2024, from https://www.scribbr.co.uk/research-methods/quasi-experimental/

Is this article helpful?

Lauren Thomas

Lauren Thomas

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Confounding Variables | Definition, Examples & Controls

Confounding Variables | Definition, Examples & Controls

Published on May 29, 2020 by Lauren Thomas . Revised on June 22, 2023.

In research that investigates a potential cause-and-effect relationship, a confounding variable is an unmeasured third variable that influences both the supposed cause and the supposed effect.

It’s important to consider potential confounding variables and account for them in your research design to ensure your results are valid . Left unchecked, confoudning variables can introduce many research biases to your work, causing you to misinterpret your results.

Table of contents

What is a confounding variable, why confounding variables matter, how to reduce the impact of confounding variables, other interesting articles, frequently asked questions about confounding variables.

Confounding variables (a.k.a. confounders or confounding factors) are a type of extraneous variable that are related to a study’s independent and dependent variables . A variable must meet two conditions to be a confounder:

  • It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.
  • It must be causally related to the dependent variable.

Example of a confounding variable

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

To ensure the internal validity of your research, you must account for confounding variables. If you fail to do so, your results may not reflect the actual relationship between the variables that you are interested in, biasing your results.

For instance, you may find a cause-and-effect relationship that does not actually exist, because the effect you measure is caused by the confounding variable (and not by your independent variable). This can lead to omitted variable bias or placebo effects , among other biases.

Even if you correctly identify a cause-and-effect relationship, confounding variables can result in over- or underestimating the impact of your independent variable on your dependent variable.

There are several methods of accounting for confounding variables. You can use the following methods when studying any type of subjects— humans, animals, plants, chemicals, etc. Each method has its own advantages and disadvantages.

Restriction

In this method, you restrict your treatment group by only including subjects with the same values of potential confounding factors.

Since these values do not differ among the subjects of your study, they cannot correlate with your independent variable and thus cannot confound the cause-and-effect relationship you are studying.

  • Relatively easy to implement
  • Restricts your sample a great deal
  • You might fail to consider other potential confounders

In this method, you select a comparison group that matches with the treatment group. Each member of the comparison group should have a counterpart in the treatment group with the same values of potential confounders, but different independent variable values.

This allows you to eliminate the possibility that differences in confounding variables cause the variation in outcomes between the treatment and comparison group. If you have accounted for any potential confounders, you can thus conclude that the difference in the independent variable must be the cause of the variation in the dependent variable.

  • Allows you to include more subjects than restriction
  • Can prove difficult to implement since you need pairs of subjects that match on every potential confounding variable
  • Other variables that you cannot match on might also be confounding variables

Statistical control

If you have already collected the data, you can include the possible confounders as control variables in your regression models ; in this way, you will control for the impact of the confounding variable.

Any effect that the potential confounding variable has on the dependent variable will show up in the results of the regression and allow you to separate the impact of the independent variable.

  • Easy to implement
  • Can be performed after data collection
  • You can only control for variables that you observe directly, but other confounding variables you have not accounted for might remain

Randomization

Another way to minimize the impact of confounding variables is to randomize the values of your independent variable. For instance, if some of your participants are assigned to a treatment group while others are in a control group , you can randomly assign participants to each group.

Randomization ensures that with a sufficiently large sample, all potential confounding variables—even those you cannot directly observe in your study—will have the same average value between different groups. Since these variables do not differ by group assignment, they cannot correlate with your independent variable and thus cannot confound your study.

Since this method allows you to account for all potential confounding variables, which is nearly impossible to do otherwise, it is often considered to be the best way to reduce the impact of confounding variables.

  • Allows you to account for all possible confounding variables, including ones that you may not observe directly
  • Considered the best method for minimizing the impact of confounding variables
  • Most difficult to carry out
  • Must be implemented prior to beginning data collection
  • You must ensure that only those in the treatment (and not control) group receive the treatment

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

confounding variables in quasi experiments

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Thomas, L. (2023, June 22). Confounding Variables | Definition, Examples & Controls. Scribbr. Retrieved July 30, 2024, from https://www.scribbr.com/methodology/confounding-variables/

Is this article helpful?

Lauren Thomas

Lauren Thomas

Other students also liked, independent vs. dependent variables | definition & examples, extraneous variables | examples, types & controls, control variables | what are they & why do they matter, what is your plagiarism score.

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 7: Nonexperimental Research

Quasi-Experimental Research

Learning Objectives

  • Explain what quasi-experimental research is and distinguish it clearly from both experimental and correlational research.
  • Describe three different types of quasi-experimental research designs (nonequivalent groups, pretest-posttest, and interrupted time series) and identify examples of each one.

The prefix  quasi  means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). [1] Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem. But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones here.

Nonequivalent Groups Design

Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A  nonequivalent groups design , then, is a between-subjects design in which participants have not been randomly assigned to conditions.

Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This design would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments, might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.

Of course, researchers using a nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.

Pretest-Posttest Design

In a  pretest-posttest design , the dependent variable is measured once before the treatment is implemented and once after it is implemented. Imagine, for example, a researcher who is interested in the effectiveness of an antidrug education program on elementary school students’ attitudes toward illegal drugs. The researcher could measure the attitudes of students at a particular elementary school during one week, implement the antidrug program during the next week, and finally, measure their attitudes again the following week. The pretest-posttest design is much like a within-subjects experiment in which each participant is tested first under the control condition and then under the treatment condition. It is unlike a within-subjects experiment, however, in that the order of conditions is not counterbalanced because it typically is not possible for a participant to be tested in the treatment condition first and then in an “untreated” control condition.

If the average posttest score is better than the average pretest score, then it makes sense to conclude that the treatment might be responsible for the improvement. Unfortunately, one often cannot conclude this with a high degree of certainty because there may be other explanations for why the posttest scores are better. One category of alternative explanations goes under the name of  history . Other things might have happened between the pretest and the posttest. Perhaps an antidrug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it. Another category of alternative explanations goes under the name of  maturation . Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a yearlong program, participants might become less impulsive or better reasoners and this might be responsible for the change.

Another alternative explanation for a change in the dependent variable in a pretest-posttest design is  regression to the mean . This refers to the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study  because  of their extreme scores. Imagine, for example, that only students who scored especially low on a test of fractions are given a special training program and then retested. Regression to the mean all but guarantees that their scores will be higher even if the training program has no effect. A closely related concept—and an extremely important one in psychological research—is  spontaneous remission . This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001) [2] . Thus one must generally be very cautious about inferring causality from pretest-posttest designs.

Does Psychotherapy Work?

Early studies on the effectiveness of psychotherapy tended to use pretest-posttest designs. In a classic 1952 article, researcher Hans Eysenck summarized the results of 24 such studies showing that about two thirds of patients improved between the pretest and the posttest (Eysenck, 1952) [3] . But Eysenck also compared these results with archival data from state hospital and insurance company records showing that similar patients recovered at about the same rate  without  receiving psychotherapy. This parallel suggested to Eysenck that the improvement that patients showed in the pretest-posttest studies might be no more than spontaneous remission. Note that Eysenck did not conclude that psychotherapy was ineffective. He merely concluded that there was no evidence that it was, and he wrote of “the necessity of properly planned and executed experimental studies into this important field” (p. 323). You can read the entire article here: Classics in the History of Psychology .

Fortunately, many other researchers took up Eysenck’s challenge, and by 1980 hundreds of experiments had been conducted in which participants were randomly assigned to treatment and control conditions, and the results were summarized in a classic book by Mary Lee Smith, Gene Glass, and Thomas Miller (Smith, Glass, & Miller, 1980) [4] . They found that overall psychotherapy was quite effective, with about 80% of treatment participants improving more than the average control participant. Subsequent research has focused more on the conditions under which different types of psychotherapy are more or less effective.

Interrupted Time Series Design

A variant of the pretest-posttest design is the  interrupted time-series design . A time series is a set of measurements taken at intervals over a period of time. For example, a manufacturing company might measure its workers’ productivity each week for a year. In an interrupted time series-design, a time series like this one is “interrupted” by a treatment. In one classic example, the treatment was the reduction of the work shifts in a factory from 10 hours to 8 hours (Cook & Campbell, 1979) [5] . Because productivity increased rather quickly after the shortening of the work shifts, and because it remained elevated for many months afterward, the researcher concluded that the shortening of the shifts caused the increase in productivity. Notice that the interrupted time-series design is like a pretest-posttest design in that it includes measurements of the dependent variable both before and after the treatment. It is unlike the pretest-posttest design, however, in that it includes multiple pretest and posttest measurements.

Figure 7.3 shows data from a hypothetical interrupted time-series study. The dependent variable is the number of student absences per week in a research methods course. The treatment is that the instructor begins publicly taking attendance each day so that students know that the instructor is aware of who is present and who is absent. The top panel of  Figure 7.3 shows how the data might look if this treatment worked. There is a consistently high number of absences before the treatment, and there is an immediate and sustained drop in absences after the treatment. The bottom panel of  Figure 7.3 shows how the data might look if this treatment did not work. On average, the number of absences after the treatment is about the same as the number before. This figure also illustrates an advantage of the interrupted time-series design over a simpler pretest-posttest design. If there had been only one measurement of absences before the treatment at Week 7 and one afterward at Week 8, then it would have looked as though the treatment were responsible for the reduction. The multiple measurements both before and after the treatment suggest that the reduction between Weeks 7 and 8 is nothing more than normal week-to-week variation.

Image description available

Combination Designs

A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a control group that is given a pretest, does  not  receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve but whether they improve  more  than participants who do not receive the treatment.

Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an antidrug program, and finally are given a posttest. Students in a similar school are given the pretest, not exposed to an antidrug program, and finally are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this change in attitude could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.

Finally, if participants in this kind of design are randomly assigned to conditions, it becomes a true experiment rather than a quasi experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.

Key Takeaways

  • Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or orders of conditions. Among the important types are nonequivalent groups designs, pretest-posttest, and interrupted time-series designs.
  • Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. It does not eliminate the problem of confounding variables, however, because it does not involve random assignment to conditions. For these reasons, quasi-experimental research is generally higher in internal validity than correlational studies but lower than true experiments.
  • Practice: Imagine that two professors decide to test the effect of giving daily quizzes on student performance in a statistics course. They decide that Professor A will give quizzes but Professor B will not. They will then compare the performance of students in their two sections on a common final exam. List five other variables that might differ between the two sections that could affect the results.
  • regression to the mean
  • spontaneous remission

Image Descriptions

Figure 7.3 image description: Two line graphs charting the number of absences per week over 14 weeks. The first 7 weeks are without treatment and the last 7 weeks are with treatment. In the first line graph, there are between 4 to 8 absences each week. After the treatment, the absences drop to 0 to 3 each week, which suggests the treatment worked. In the second line graph, there is no noticeable change in the number of absences per week after the treatment, which suggests the treatment did not work. [Return to Figure 7.3]

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin. ↵
  • Posternak, M. A., & Miller, I. (2001). Untreated short-term course of major depression: A meta-analysis of studies using outcomes from studies using wait-list control groups. Journal of Affective Disorders, 66 , 139–146. ↵
  • Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16 , 319–324. ↵
  • Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy . Baltimore, MD: Johns Hopkins University Press. ↵

A between-subjects design in which participants have not been randomly assigned to conditions.

The dependent variable is measured once before the treatment is implemented and once after it is implemented.

A category of alternative explanations for differences between scores such as events that happened between the pretest and posttest, unrelated to the study.

An alternative explanation that refers to how the participants might have changed between the pretest and posttest in ways that they were going to anyway because they are growing and learning.

The statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion.

The tendency for many medical and psychological problems to improve over time without any form of treatment.

A set of measurements taken at intervals over a period of time that are interrupted by a treatment.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

confounding variables in quasi experiments

The prefix  quasi  means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Recall with a true between-groups experiment, random assignment to conditions is used to ensure the groups are equivalent and with a true within-subjects design counterbalancing is used to guard against order effects. Quasi-experiments are missing one of these safeguards. Although an independent variable is manipulated, either a control group is missing or participants are not randomly assigned to conditions (Cook & Campbell, 1979) [1] .

Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem associated with non-experimental research. But because either counterbalancing techniques are not used or participants are not randomly assigned to conditions—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between non-experimental studies and true experiments.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones in this chapter. 

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin. ↵

Creative Commons License

Share This Book

  • Increase Font Size

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Gastroenterol Hepatol Bed Bench
  • v.5(2); Spring 2012

How to control confounding effects by statistical analysis

Mohamad amin pourhoseingholi.

1 Department of Biostatistics, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Ahmad Reza Baghestani

2 Department of Mathematic, Islamic Azad University - South Tehran Branch, Iran

Mohsen Vahedi

3 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran

A Confounder is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship. There are various ways to exclude or control confounding variables including Randomization, Restriction and Matching. But all these methods are applicable at the time of study design. When experimental designs are premature, impractical, or impossible, researchers must rely on statistical methods to adjust for potentially confounding effects. These Statistical models (especially regression models) are flexible to eliminate the effects of confounders.

Introduction

Confounding variables or confounders are often defined as the variables correlate (positively or negatively) with both the dependent variable and the independent variable ( 1 ). A Confounder is an extraneous variable whose presence affects the variables being studied so that the results do not reflect the actual relationship between the variables under study.

The aim of major epidemiological studies is to search for the causes of diseases, based on associations with various risk factors. There may be also other factors that are associated with the exposure and affect the risk of developing the disease and they will distort the observed association between the disease and exposure under study. A hypothetical example would be a study of relation between coffee drinking and lung cancer. If the person who entered in the study as a coffee drinker was also more likely to be cigarette smoker, and the study only measured coffee drinking but not smoking, the results may seem to show that coffee drinking increases the risk of lung cancer, which may not be true. However, if a confounding factor (in this example, smoking) is recognized, adjustments can be made in the study design or data analysis so that the effects of confounder would be removed from the final results. Simpson's paradox too is another classic example of confounding ( 2 ). Simpson's paradox refers to the reversal of the direction of an association when data from several groups are combined to form a single group.

The researchers therefore need to account for these variables - either through experimental design and before the data gathering, or through statistical analysis after the data gathering process. In this case the researchers are said to account for their effects to avoid a false positive (Type I) error (a false conclusion that the dependent variables are in a casual relationship with the independent variable). Thus, confounding is a major threat to the validity of inferences made about cause and effect (internal validity). There are various ways to modify a study design to actively exclude or control confounding variables ( 3 ) including Randomization, Restriction and Matching.

In randomization the random assignment of study subjects to exposure categories to breaking any links between exposure and confounders. This reduces potential for confounding by generating groups that are fairly comparable with respect to known and unknown confounding variables.

Restriction eliminates variation in the confounder (for example if an investigator only selects subjects of the same age or same sex then, the study will eliminate confounding by sex or age group). Matching which involves selection of a comparison group with respect to the distribution of one or more potential confounders.

Matching is commonly used in case-control studies (for example, if age and sex are the matching variables, then a 45 year old male case is matched to a male control with same age).

But all these methods mentioned above are applicable at the time of study design and before the process of data gathering. When experimental designs are premature, impractical, or impossible, researchers must rely on statistical methods to adjust for potentially confounding effects ( 4 ).

Statistical Analysis to eliminate confounding effects

Unlike selection or information bias, confounding is one type of bias that can be, adjusted after data gathering, using statistical models. To control for confounding in the analyses, investigators should measure the confounders in the study. Researchers usually do this by collecting data on all known, previously identified confounders. There are mostly two options to dealing with confounders in analysis stage; Stratification and Multivariate methods.

1. Stratification

Objective of stratification is to fix the level of the confounders and produce groups within which the confounder does not vary. Then evaluate the exposure-outcome association within each stratum of the confounder. So within each stratum, the confounder cannot confound because it does not vary across the exposure-outcome.

After stratification, Mantel-Haenszel (M-H) estimator can be employed to provide an adjusted result according to strata. If there is difference between Crude result and adjusted result (produced from strata) confounding is likely. But in the case that Crude result dose not differ from the adjusted result, then confounding is unlikely.

2. Multivariate Models

Stratified analysis works best in the way that there are not a lot of strata and if only 1 or 2 confounders have to be controlled. If the number of potential confounders or the level of their grouping is large, multivariate analysis offers the only solution.

Multivariate models can handle large numbers of covariates (and also confounders) simultaneously. For example in a study that aimed to measure the relation between body mass index and Dyspepsia, one could control for other covariates like as age, sex, smoking, alcohol, ethnicity, etc in the same model.

2.1. Logistic Regression

Logistic regression is a mathematical process that produces results that can be interpreted as an odds ratio, and it is easy to use by any statistical package. The special thing about logistic regression is that it can control for numerous confounders (if there is a large enough sample size). Thus logistic regression is a mathematical model that can give an odds ratio which is controlled for multiple confounders. This odds ratio is known as the adjusted odds ratio, because its value has been adjusted for the other covariates (including confounders).

2.2. Linear Regression

The linear regression analysis is another statistical model that can be used to examine the association between multiple covariates and a numeric outcome. This model can be employed as a multiple linear regression to see through confounding and isolate the relationship of interest ( 5 ). For example, in a research seeking for relationship between LDL cholesterol level and age, the multiple linear regression lets you answer the question, How does LDL level vary with age, after accounting for blood sugar and lipid (as the confounding factors)? In multiple linear regression (as mentioned for logistic regression), investigators can include many covariates at one time. The process of accounting for covariates is also called adjustment (similar to logistic regression model) and comparing the results of simple and multiple linear regressions can clarify that how much the confounders in the model distort the relationship between exposure and outcome.

2.3. Analysis of Covariance

The Analysis of Covariance (ANCOVA) is a type of Analysis of Variance (ANOVA) that is used to control for potential confounding variables. ANCOVA is a statistical linear model with a continuous outcome variable (quantitative, scaled) and two or more predictor variables where at least one is continuous (quantitative, scaled) and at least one is categorical (nominal, non-scaled). ANCOVA is a combination of ANOVA and linear regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (confounders) account. The inclusion of this analysis can increase the statistical power.

The Analysis of Covariance (ANCOVA) is a type of Analysis of Variance (ANOVA) that is used to control for potential confounding variables . ANCOVA is a statistical linear model with a continuous outcome variable (quantitative, scaled) and two or more predictor variables where at least one is continuous (quantitative, scaled) and at least one is categorical (nominal, non-scaled). ANCOVA is a combination of ANOVA and linear regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (confounders) account. The inclusion of this analysis can increase the statistical power.

Practical example

Suppose that, in a cross-sectional study, we are seeking for the relation between infection with Helicobacter. Pylori (HP) and Dyspepsia Symptoms. The study conducted on 550 persons with positive H.P and 440 persons without HP. The results are appeared in 2*2 crude table ( Table 1 ) that indicated that the relation between infection with H.P and Dyspepsia is a reverese association (OR = 0.60, 95% CI: 0.42-0.94). Now suppose that weight can be a potential confounder in this study. So we break the crude table down in two stratum according to the weight of subjects (normal weight or over weight) and then calculate OR's for each stratum again. If stratum-specific OR is similar to crude OR, there is no potential impact from confounding factors. In this example there are different OR for each stratum (for normal weight group OR= 0.80, 95% CI: 0.38-1.69 and for overweight group OR= 1.60, 95% CI: 0.79-3.27).

The crude contingency table of association between H.Pylori and Dyspepsia

Dyspepsia (positive)Dyspepsia (negative)
50500
60380

The contingency table of association between H. Pylori and Dyspepsia for person who are in normal weight group

Dyspepsia (positive)Dyspepsia (negative)
1050
50200

The contingency table of association between H. Pylori and Dyspepsia for person who are in over weight group

Dyspepsia (positive)Dyspepsia (negative)
40450
10180

This shows that there is a potential confounding affects which is presented by weight in this study. This example is a type of Simpson's paradox, therefore the crude OR is not justified for this study. We calculated the Mantel-Haenszel (M-H) estimator as an alternative statistical analysis to remove the confounding effects (OR= 1.16, 95% CI: 0.71-1.90). Also logistic regression model (in which, weight is presented in multiple model) would be conducted to control the confounder, its result is similar as M-H estimator (OR= 1.15, 95% CI: 0.71-1.89).

The results of this example clearly indicated that if the impacts of confounders did not account in the analysis, the results can deceive the researchers with unjustified results.

Confounders are common causes of both treatment/exposure and of response/outcome. Confounding is better taken care of by randomization at the design stage of the research ( 6 ).

A successful randomization minimizes confounding by unmeasured as well as measured factors, whereas statistical control that addresses confounding by measurement and can introduce confounding through inappropriate control ( 7 – 9 ).

Confounding can persist, even after adjustment. In many studies, confounders are not adjusted because they were not measured during the process of data gathering. In some situation, confounder variables are measured with error or their categories are improperly defined (for example age categories were not well implied its confounding nature) ( 10 ). Also there is a possibility that the variables that are controlled as the confounders were actually not confounders.

Before applying a statistical correction method, one has to decide which factors are confounders. This sometimes is a complex issue ( 11 – 13 ). Common strategies to decide whether a variable is a confounder that should be adjusted or not, rely mostly on statistical criteria. The research strategy should be based on the knowledge of the field and on conceptual framework and causal model. So expertise' criteria should be involved for evaluating the confounders. Statistical models (especially regression models) are a flexible way of investigating the separate or joint effects of several risk factors for disease or ill health ( 14 ). But the researchers should notice that wrong assumptions about the form of the relationship between confounder and disease can lead to wrong conclusions about exposure effects too.

( Please cite as: Pourhoseingholi MA, Baghestani AR, Vahedi M. How to control confounding effects by statistical analysis. Gastroenterol Hepatol Bed Bench 2012;5(2):79-83.)

IMAGES

  1. PPT

    confounding variables in quasi experiments

  2. PPT

    confounding variables in quasi experiments

  3. Variables involved in the quasi-experiments

    confounding variables in quasi experiments

  4. PPT

    confounding variables in quasi experiments

  5. What is a Confounding Variable? (Definition & Example)

    confounding variables in quasi experiments

  6. Experimentation (Quasi-experiments (Confounding (Factor that is correlated…

    confounding variables in quasi experiments

VIDEO

  1. Chapter 5. Alternatives to Experimentation: Correlational and Quasi Experimental Designs

  2. Research Design: Quasi-Experiments

  3. Research Methods, Section 5.2: Quasi Experiments

  4. Episode 9: Confounding Variables in Psychological Experiments PT. 2

  5. QUASI

  6. random sampling & assignment

COMMENTS

  1. Selecting and Improving Quasi-Experimental Designs in Effectiveness and Implementation Research

    Random allocation minimizes selection bias and maximizes the likelihood that measured and unmeasured confounding variables are distributed equally, enabling any difference in outcomes between intervention and control arms to be attributed to the intervention under study. ... Quasi-experimental designs (QEDs), which first gained prominence in ...

  2. PDF Quasi-Experimental Designs

    In this reading, we'll discuss five quasi-experimental approaches: 1) matching, 2) mixed designs, 3) single-subject designs, and 4) developmental designs. (b) are plausible causes of the dependent variable. Quasi-experiments are designed to reduce confounding variables as much as possible, given that random assignment is not available.

  3. Quasi-Experimental Design

    Revised on January 22, 2024. Like a true experiment, a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable. However, unlike a true experiment, a quasi-experiment does not rely on random assignment. Instead, subjects are assigned to groups based on non-random criteria.

  4. The Use and Interpretation of Quasi-Experimental Studies in Medical

    In quasi-experimental studies of medical informatics, we believe that the methodological principles that most often result in alternative explanations for the apparent causal effect include (a) difficulty in measuring or controlling for important confounding variables, particularly unmeasured confounding variables, which can be viewed as a ...

  5. Quasi Experimental Design Overview & Examples

    In contrast, true experiments use random assignment to the treatment and control groups to control confounding variables, making them the gold standard for identifying cause-and-effect relationships.. Quasi-experimental research is a design that closely resembles experimental research but is different. The term "quasi" means "resembling," so you can think of it as a cousin to actual ...

  6. 7.3 Quasi-Experimental Research

    But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.

  7. PDF Quasi-Experimental Evaluation Designs

    Quasi-experimental designs identify a comparison group that is as similar as possible to the treatment group in terms of pre-intervention ... Confounding variable: an "extra" variable you didn't account for in assessing the impact of an intervention on an outcome. 6

  8. Quasi-experimental causality in neuroscience and behavioural ...

    a, Graphical model of the RDD approach.W represents confounding variables; R is the running variable, which determines the treatment along with the threshold; X is the treatment (independent ...

  9. Quasi-experiment

    Quasi-experimental estimates of impact are subject to contamination by confounding variables. In the example above, a variation in the children's response to spanking is plausibly influenced by factors that cannot be easily measured and controlled, for example the child's intrinsic wildness or the parent's irritability.

  10. Quantifying causality in data science with quasi-experiments

    If the goal is to understand decision-making, actions or interventions, a data scientist needs to carefully consider their data: is it observational or experimental, are there concerns about unobserved confounding variables and what strategies can be taken to address confounding? The quasi-experiments we have presented here are one set of ...

  11. Quantifying causality in data science with quasi-experiments

    For example, there has been work using overlapping experimental data to control for confounding in observational studies 64, which could be extended to leverage quasi-experimental data.

  12. Quasi-Experimental Research

    Quasi-Experimental Research The prefix ... that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between non-experimental studies and true experiments. ...

  13. Quasi-Experimental Design

    Revised on 22 January 2024. Like a true experiment, a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable. However, unlike a true experiment, a quasi-experiment does not rely on random assignment. Instead, subjects are assigned to groups based on non-random criteria.

  14. Experimental and Quasi-Experimental Designs in Implementation Research

    Other implementation science questions are more suited to quasi-experimental designs, which are intended to estimate the effect of an intervention in the absence of randomization. ... Randomization of patients is seen as a crucial to reducing the impact of measured or unmeasured confounding variables, in turn allowing researchers to draw ...

  15. Confounding Variables

    Confounding variables (a.k.a. confounders or confounding factors) are a type of extraneous variable that are related to a study's independent and dependent variables. A variable must meet two conditions to be a confounder: It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.

  16. Quasi-Experimental Research

    The prefix quasi means "resembling." Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). [1] Because the independent variable is manipulated before the dependent variable ...

  17. Quasi Experiment

    In all cases, however, the researcher should strive to maximize both internal and external validity as much as possible. Thoughtful consideration about ways to identify and test potential confounding variables is crucial. Quasi-experiments can be one important component of a well-conceived program of research on important social issues.

  18. 8.2 Non-Equivalent Groups Designs

    Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. However, it does not eliminate the problem of confounding variables, because it does not involve random assignment to conditions or counterbalancing.

  19. Chapter 8: Quasi-Experimental Research

    The prefix quasi means "resembling." Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Recall with a true between-groups experiment, random assignment to conditions is used to ensure the groups are equivalent and with a true within-subjects design counterbalancing is used to guard against order effects.

  20. Quasi-Experimental Research Design

    Quasi-experimental designs can also help inform policy and practice by providing valuable insights into the causal relationships between variables. Overall, the purpose of quasi-experimental design is to provide a rigorous method for evaluating the impact of interventions, policies, and programs while controlling for potential confounding ...

  21. The Limitations of Quasi-Experimental Studies, and Methods for Data

    Keywords: Quasi-experimental study, research design, univariable analysis, multivariable regression, confounding variables If we wish to study how antidepressant drug treatment affects outcomes in pregnancy, we should ideally randomize depressed pregnant women to receive an antidepressant drug or placebo; this is a randomized controlled trial ...

  22. How to control confounding effects by statistical analysis

    Abstract. A Confounder is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship. There are various ways to exclude or control confounding variables including Randomization, Restriction and Matching. But all these methods are applicable at the time of study design.

  23. The Limitations of Quasi-Experimental Studies, and Methods for Data

    In such analyses, the sample size for a continuous dependent variable should ideally be at least 10-15 times the number of independent variables. 4 Given that the number of confounding variables to be included is likely to be large, a very large sample will become necessary. Additionally, because studies are never perfect, it would be ...