Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

One-way ANOVA | When and How to Use It (With Examples)

Published on March 6, 2020 by Rebecca Bevans . Revised on May 10, 2024.

ANOVA , which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups.

A one-way ANOVA uses one independent variable , while a two-way ANOVA uses two independent variables.

Table of contents

When to use a one-way anova, how does an anova test work, assumptions of anova, performing a one-way anova, interpreting the results, post-hoc testing, reporting the results of anova, other interesting articles, frequently asked questions about one-way anova.

Use a one-way ANOVA when you have collected data about one categorical independent variable and one quantitative dependent variable . The independent variable should have at least three levels (i.e. at least three different groups or categories).

ANOVA tells you if the dependent variable changes according to the level of the independent variable. For example:

  • Your independent variable is social media use , and you assign groups to low , medium , and high levels of social media use to find out if there is a difference in hours of sleep per night .
  • Your independent variable is brand of soda , and you collect data on Coke , Pepsi , Sprite , and Fanta to find out if there is a difference in the price per 100ml .
  • You independent variable is type of fertilizer , and you treat crop fields with mixtures 1 , 2 and 3 to find out if there is a difference in crop yield .

The null hypothesis ( H 0 ) of ANOVA is that there is no difference among group means. The alternative hypothesis ( H a ) is that at least one group differs significantly from the overall mean of the dependent variable.

If you only want to compare two groups, use a t test instead.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

experimental design anova

ANOVA determines whether the groups created by the levels of the independent variable are statistically different by calculating whether the means of the treatment levels are different from the overall mean of the dependent variable.

If any of the group means is significantly different from the overall mean, then the null hypothesis is rejected.

ANOVA uses the F test for statistical significance . This allows for comparison of multiple means at once, because the error is calculated for the whole set of comparisons rather than for each individual two-way comparison (which would happen with a t test).

The F test compares the variance in each group mean from the overall group variance. If the variance within groups is smaller than the variance between groups , the F test will find a higher F value, and therefore a higher likelihood that the difference observed is real and not due to chance.

The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:

  • Independence of observations : the data were collected using statistically valid sampling methods , and there are no hidden relationships among observations. If your data fail to meet this assumption because you have a confounding variable that you need to control for statistically, use an ANOVA with blocking variables.
  • Normally-distributed response variable : The values of the dependent variable follow a normal distribution .
  • Homogeneity of variance : The variation within each group being compared is similar for every group. If the variances are different among the groups, then ANOVA probably isn’t the right fit for the data.

While you can perform an ANOVA by hand , it is difficult to do so with more than a few observations. We will perform our analysis in the R statistical program because it is free, powerful, and widely available. For a full walkthrough of this ANOVA example, see our guide to performing ANOVA in R .

The sample dataset from our imaginary crop yield experiment contains data about:

  • fertilizer type (type 1, 2, or 3)
  • planting density (1 = low density, 2 = high density)
  • planting location in the field (blocks 1, 2, 3, or 4)
  • final crop yield (in bushels per acre).

This gives us enough information to run various different ANOVA tests and see which model is the best fit for the data.

For the one-way ANOVA, we will only analyze the effect of fertilizer type on crop yield.

Sample dataset for ANOVA

After loading the dataset into our R environment, we can use the command aov() to run an ANOVA. In this example we will model the differences in the mean of the response variable , crop yield, as a function of type of fertilizer.

To view the summary of a statistical model in R, use the summary() function.

The summary of an ANOVA test (in R) looks like this:

One-way ANOVA summary

The ANOVA output provides an estimate of how much variation in the dependent variable that can be explained by the independent variable.

  • The first column lists the independent variable along with the model residuals (aka the model error).
  • The Df column displays the degrees of freedom for the independent variable (calculated by taking the number of levels within the variable and subtracting 1), and the degrees of freedom for the residuals (calculated by taking the total number of observations minus 1, then subtracting the number of levels in each of the independent variables).
  • The Sum Sq column displays the sum of squares (a.k.a. the total variation) between the group means and the overall mean explained by that variable. The sum of squares for the fertilizer variable is 6.07, while the sum of squares of the residuals is 35.89.
  • The Mean Sq column is the mean of the sum of squares, which is calculated by dividing the sum of squares by the degrees of freedom.
  • The F value column is the test statistic from the F test: the mean square of each independent variable divided by the mean square of the residuals. The larger the F value, the more likely it is that the variation associated with the independent variable is real and not due to chance.
  • The Pr(>F) column is the p value of the F statistic. This shows how likely it is that the F value calculated from the test would have occurred if the null hypothesis of no difference among group means were true.

Because the p value of the independent variable, fertilizer, is statistically significant ( p < 0.05), it is likely that fertilizer type does have a significant effect on average crop yield.

ANOVA will tell you if there are differences among the levels of the independent variable, but not which differences are significant. To find how the treatment levels differ from one another, perform a TukeyHSD (Tukey’s Honestly-Significant Difference) post-hoc test.

The Tukey test runs pairwise comparisons among each of the groups, and uses a conservative error estimate to find the groups which are statistically different from one another.

The output of the TukeyHSD looks like this:

Tukey summary one-way ANOVA

First, the table reports the model being tested (‘Fit’). Next it lists the pairwise differences among groups for the independent variable.

Under the ‘$fertilizer’ section, we see the mean difference between each fertilizer treatment (‘diff’), the lower and upper bounds of the 95% confidence interval (‘lwr’ and ‘upr’), and the p value , adjusted for multiple pairwise comparisons.

The pairwise comparisons show that fertilizer type 3 has a significantly higher mean yield than both fertilizer 2 and fertilizer 1, but the difference between the mean yields of fertilizers 2 and 1 is not statistically significant.

When reporting the results of an ANOVA, include a brief description of the variables you tested, the  F value, degrees of freedom, and p values for each independent variable, and explain what the results mean.

If you want to provide more detailed information about the differences found in your test, you can also include a graph of the ANOVA results , with grouping letters above each level of the independent variable to show which groups are statistically different from one another:

One-way ANOVA graph

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

The only difference between one-way and two-way ANOVA is the number of independent variables . A one-way ANOVA has one independent variable, while a two-way ANOVA has two.

  • One-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
  • Two-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.

All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a t-test instead.

A factorial ANOVA is any ANOVA that uses more than one categorical independent variable . A two-way ANOVA is a type of factorial ANOVA.

Some examples of factorial ANOVAs include:

  • Testing the combined effects of vaccination (vaccinated or not vaccinated) and health status (healthy or pre-existing condition) on the rate of flu infection in a population.
  • Testing the effects of marital status (married, single, divorced, widowed), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
  • Testing the effects of feed type (type A, B, or C) and barn crowding (not crowded, somewhat crowded, very crowded) on the final weight of chickens in a commercial farming operation.

In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.

Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).

If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2024, May 09). One-way ANOVA | When and How to Use It (With Examples). Scribbr. Retrieved September 9, 2024, from https://www.scribbr.com/statistics/one-way-anova/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, two-way anova | examples & when to use it, anova in r | a complete step-by-step guide with examples, guide to experimental design | overview, steps, & examples, what is your plagiarism score.

Please enable JavaScript to view this site.

  • Statistics Guide
  • Curve Fitting Guide
  • Prism Guide
  • Zoom Window Out
  • Larger Text  |  Smaller Text
  • Hide Page Header
  • Show Expanding Text
  • Printable Version
  • Save Permalink URL
> >

Experimental design tab: One-way ANOVA

Scroll Prev Top Next More

Prism offers four related tests that compare three or more groups. Your choice of a test depends on these choices:

Experimental Design

Choose a repeated measures test when the columns of data are matched. Here are some examples:

• You measure a variable in each subject several times, perhaps before, during and after an intervention.

• You recruit subjects as matched groups, matched for variables such as age, ethnic group, and disease severity.

• You run a laboratory experiment several times, each time with several treatments handled in parallel. Since you anticipate experiment-to-experiment variability, you want to analyze the data in such a way that each experiment is treated as a matched set.

Matching should not be based on the variable you are comparing. If you are comparing blood pressures in three groups, it is OK to match based on age or zip code, but it is not OK to match based on blood pressure.

The term repeated measures applies strictly when you give treatments repeatedly to one subject (the first example above). The other two examples are called randomized block experiments (each set of subjects is called a block, and you randomly assign treatments within each block). The analyses are identical for repeated measures and randomized block experiments, and Prism always uses the term repeated measures.

Choose "no matching" if you have a completely randomized design.

Assume Gaussian distribution?

Nonparametric tests , unlike ANOVA are not based on the assumption that the data are sampled from a Gaussian distribution . But nonparametric tests have less power , and report only P values but not confidence intervals. Deciding when to use a nonparametric test is not straightforward .

If no matching: Assume homoscedasticity?

One assumption underlying the usual ANOVA F test is homogeneity of variance. That means that each group is sampled from populations with the same variance (and thus the same standard deviation) even if the means differ.

Starting with Prism 8, you choose whether or not to assume equal  population variances. If you choose not to make that assumption, Prism performs two alternative forms of ANOVA and reports both results. Both Welch's ANOVA and Brown-Forsythe ANOVA adjusts the calculations of the F ratio and degrees of freedom to adjust for heterogeneity of within-group variances. The P value can be interpreted in the same manner as in the analysis of variance table.  

• Why use these special forms of ANOVA rather than use a nonparametric Kruskal-Wallis test? Because while the Kruskal-Wallis test does not assume that the data are sampled from Gaussian distributions, it does assume that the dispersion or spread of the distributions are the same.

• As an alternative to these tests, consider transforming your data (logarithms, reciprocals, etc.) and analyzing the transformed values with ordinary ANOVA.

• This Brown-Forsythe test to compare means is distinct from another test also named Brown-Forstyhe that compares variances.

If repeated measures: Assume sphericity?

The concept of sphericity.

The concept of sphericity is tricky to understand. Briefly it means that you waited long enough between treatments for any treatment effect to wash away.This concept is not relevant if your data are not repeated measures, or if you choose a nonparametric test.

For each subject subtract the value in column B from the value in column A, and compute the standard deviation of this list of differences. Now do the same thing for the difference between column A and C, between B and C, etc. If the assumption of sphericity is true, all these standard deviations should have similar values, with any differences being due to  chance. If there are large, systematic differences between these standard deviations, the assumption of sphericity is not valid.

How to decide whether to assume sphericity

If each row of data represents a set of matched observations, then there is no reason to doubt the assumption of sphericity. This is sometimes called a randomized block experimental design.

If each row of data represents a single subject given successive treatments, then you have a  repeated measures experimental design. The assumption of sphericity is unlikely to be an issue if the order of treatments is randomized for each subject, so one subject gets treatments A then B then C, while another gets B, then A, then C... But if all subjects are given the treatments in the same order, it is better to not assume sphericity.

If you aren't sure, we recommend that you do not assume sphericity.

How your choice affects Prism's calculations

If you choose to not assume sphericity, Prism will:

• Include the Geisser-Greenhouse correction when computing the repeated measures ANOVA P value. The resulting P value will be higher than it would have been without that correction.

• Quantify violations of sphericity by reporting epsilon .

• Compute multiple comparisons tests differently.

If you ask Prism to assume sphericity, but in fact that assumption is violated, the P value from ANOVA will be too low. For that reason, if you are unsure whether or not to assume sphericity, we recommend that you check the option to not assume sphericity.

Test summary

Test

Matched

Nonparametric

Assume equal variances?

No

No

Yes

No

No

No

Yes

No

NA

No

Yes

NA

Yes

Yes

NA

© 1995- 2019 GraphPad Software, LLC. All rights reserved.

resize nav pane

  

University of Colorado

ANOVA and Experimental Design

This course is part of Statistical Modeling for Data Science Applications Specialization

Sponsored by University of Colorado

4,786 already enrolled

(17 reviews)

Recommended experience

Intermediate level

Calculus, linear algebra, and probability theory.

What you'll learn

Identify and interpret the two-way ANOVA (and ANCOVA) model(s) as a linear regression model.

Use the two-way ANOVA and ANCOVA models to answer research questions using real data.

Define and apply the concepts of replication, repeated measures, and full factorial design in the context of two-way ANOVA.

Skills you'll gain

  • Probability & Statistics
  • General Statistics

Details to know

experimental design anova

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Placeholder

Build your subject-matter expertise

  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 4 modules in this course

This second course in statistical modeling will introduce students to the study of the analysis of variance (ANOVA), analysis of covariance (ANCOVA), and experimental design. ANOVA and ANCOVA, presented as a type of linear regression model, will provide the mathematical basis for designing experiments for data science applications. Emphasis will be placed on important design-related concepts, such as randomization, blocking, factorial design, and causality. Some attention will also be given to ethical issues raised in experimentation.

This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder. Logo adapted from photo by Vincent Ledvina on Unsplash

Introduction to ANOVA and Experimental Design

In this module, we will introduce the basic conceptual framework for experimental design and define the models that will allow us to answer meaningful questions about the differences between group means with respect to a continuous variable. Such models include the one-way Analysis of Variance (ANOVA) and Analysis of Covariance (ANCOVA) models.

What's included

9 videos 3 readings 9 quizzes 2 programming assignments 1 peer review 1 discussion prompt 2 ungraded labs

9 videos • Total 86 minutes

  • Introduction to Experimental Design • 10 minutes • Preview module
  • The One-Way ANOVA and ANCOVA Models • 6 minutes
  • ANOVA Variance Decomposition • 8 minutes
  • ANOVA Sums of Squares and the F-test • 14 minutes
  • ANOVA and ANCOVA as Regression Models • 10 minutes
  • One-Way ANOVA Interpretation in the Regression Context • 10 minutes
  • The ANCOVA Model • 15 minutes
  • ANCOVA with Interactions • 7 minutes
  • ANCOVA with Interactions in R • 4 minutes

3 readings • Total 30 minutes

  • Earn Academic Credit for your Work! • 10 minutes
  • Course Support • 10 minutes
  • Assessment Expectations • 10 minutes

9 quizzes • Total 270 minutes

  • Introduction to ANOVA and Experimental Design • 30 minutes
  • The One-Way ANOVA and ANCOVA Models • 30 minutes
  • ANOVA Variance Decomposition • 30 minutes
  • ANOVA Sums of Squares and the F-Test • 30 minutes
  • ANOVA and ANCOVA as Regression Models • 30 minutes
  • One-Way ANOVA Interpretation in the Regression Context • 30 minutes
  • The ANCOVA Model • 30 minutes
  • ANCOVA with Interactions • 30 minutes
  • ANCOVA with Interactions in R • 30 minutes

2 programming assignments • Total 120 minutes

  • Module 1 Autograded • 60 minutes
  • Optional Introduction to Jupyter and R • 60 minutes

1 peer review • Total 60 minutes

  • Module 1 Peer-Review Submission • 60 minutes

1 discussion prompt • Total 10 minutes

  • Introduce Yourself • 10 minutes

2 ungraded labs • Total 120 minutes

  • ANCOVA with Interactions in R • 60 minutes
  • Module 1 Peer-Review Lab • 60 minutes

Hypothesis Testing in the ANOVA Context

In this module, we will learn how statistical hypothesis testing and confidence intervals, in the ANOVA/ANCOVA context, can help answer meaningful questions about the differences between group means with respect to a continuous variable.

6 videos 2 readings 4 quizzes 1 programming assignment 1 peer review 2 ungraded labs

6 videos • Total 91 minutes

  • Beyond the Full F-test • 12 minutes • Preview module
  • Planned Comparisons: Defining Contrasts • 16 minutes
  • Planned Comparisons: Hypothesis Testing with Contrasts • 14 minutes
  • Post Hoc Comparisons • 13 minutes
  • Post Hoc Comparisons in R • 16 minutes
  • Type II Error and Power in the ANOVA Context • 18 minutes

2 readings • Total 20 minutes

  • Patrizio E. Tressoldi and David Giofré: "The pervasive avoidance of prospective statistical power: major consequences and practical solutions" • 10 minutes
  • Optional: Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors • 10 minutes

4 quizzes • Total 120 minutes

  • Beyond the Full F-test • 30 minutes
  • Planned Comparisons: Defining Contrasts • 30 minutes
  • Planned and Unplanned Comparisons • 30 minutes
  • Type II Error and Power in the ANOVA Context • 30 minutes

1 programming assignment • Total 120 minutes

  • Module 2 Autograded Assignment • 120 minutes
  • Module 2 Peer-Review Submission • 60 minutes
  • Planned Comparisons Using Contrasts in R • 60 minutes
  • Module 2 Peer-Review Lab • 60 minutes

Two-Way ANOVA and Interactions

In this module, we will study the two-way ANOVA model and use it to answer research questions using real data.

7 videos 6 quizzes 1 programming assignment 1 peer review 1 ungraded lab

7 videos • Total 78 minutes

  • Motivating the Two-way ANOVA Model • 10 minutes • Preview module
  • The two-way ANOVA model • 9 minutes
  • The Two-way ANOVA Model as a Regression Model • 9 minutes
  • Interaction Terms in the Two-way ANOVA Model: Definitions and Visualizations • 13 minutes
  • Interactions in the Two-way ANOVA Model: Formal Tests • 15 minutes
  • Two-way ANOVA Hypothesis Testing (no interaction) • 14 minutes
  • Looking Ahead: Two-Way ANOVA and Experimental Design • 5 minutes

6 quizzes • Total 180 minutes

  • Motivating the Two-way ANOVA Model • 30 minutes
  • The Two-way ANOVA Model • 30 minutes
  • The Two-way ANOVA Model as a Regression Model • 30 minutes
  • Interaction Terms in the Two-way ANOVA Model: Definitions and Visualizations • 30 minutes
  • Interactions in the Two-way ANOVA Model: Formal Tests • 30 minutes
  • Two-way ANOVA Hypothesis Testing (no interaction) • 30 minutes

1 programming assignment • Total 180 minutes

  • Module 3 Autograded Assignment • 180 minutes
  • Module 3 Peer-Review Submission • 60 minutes

1 ungraded lab • Total 60 minutes

  • Module 3 Peer-Review Lab • 60 minutes

Experimental Design: Basic Concepts and Designs

In this module, we will study fundamental experimental design concepts, such as randomization, treatment design, replication, and blocking. We will also look at basic factorial designs as an improvement over elementary “one factor at a time” methods. We will combine these concepts with the ANOVA and ANCOVA models to conduct meaningful experiments.

7 videos 2 readings 5 quizzes 1 programming assignment 1 peer review 2 ungraded labs

7 videos • Total 79 minutes

  • The Conceptual Framework of Experimental Design • 19 minutes • Preview module
  • The Completely Randomized Design • 12 minutes
  • The Randomized Complete Block Design (RCBD) • 8 minutes
  • The Randomized Complete Block Design (RCBD): Hypothesis Testing • 8 minutes
  • The Factorial Design • 10 minutes
  • Further Issues in Experimental Design • 7 minutes
  • Ethical Issues in Experimental Design • 12 minutes
  • Causation and Experimental Design • 10 minutes
  • Resources on Ethics • 10 minutes

5 quizzes • Total 150 minutes

  • The Conceptual Framework of Experimental Design • 30 minutes
  • The Completely Randomized Design • 30 minutes
  • The Randomized Complete Block Design (RCBD) • 30 minutes
  • The Factorial Design • 30 minutes
  • Further Issues in Experimental Design • 30 minutes
  • Module 4 Autograded Assignment • 120 minutes
  • Module 4 Peer-Review Submission • 60 minutes

2 ungraded labs • Total 180 minutes

  • A Completely Randomized Design (CRD) in R • 60 minutes
  • Module 4 Peer-Review Lab • 120 minutes

Instructor ratings

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

Brian Zaharatos

CU-Boulder is a dynamic community of scholars and learners on one of the most spectacular college campuses in the country. As one of 34 U.S. public institutions in the prestigious Association of American Universities (AAU), we have a proud tradition of academic excellence, with five Nobel laureates and more than 50 members of prestigious academic academies.

Why people choose Coursera for their career

experimental design anova

Learner reviews

Showing 3 of 17

Reviewed on Jul 30, 2022

Great course. Really useful and practical, and the exercise is not too difficult.

Recommended if you're interested in Data Science

experimental design anova

Google Cloud

Securing and Integrating Components of your Application 日本語版

experimental design anova

Building Transformations and Preparing Data with Wrangler in Cloud Data Fusion

experimental design anova

University of Colorado Boulder

Modern Regression Analysis in R

experimental design anova

The Data Driven Manager

Specialization

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

  • Collections & Categories

The one-way ANOVA model

experimental design anova

The statistical model can be written as in the equation above.On the left hand side of the equation we have the response variable (y ij ) that identifies each observation with the corresponding treatment and replication.

i  =1…  t,   t is the number of treatment levels.

j  =1…  n t ,   n t is the number of replications of treatment level

Note that the number of replications per treatment level do not necessarily has to be the same. The total number of observations is: N = ∑  t  n  t

On the right hand side, a linear combination of model parameters:

μ is a constant

τ i   the effect of treatment  

ε ij  a random effect attached to treatment  and replication  called the error.

We typically assume that ε ij   are independent and identically distributed  (iid) , following a Normal distribution with equal variance, in short: ε ij ~N(0, σ 2 ).

experimental design anova

Home page

  • Animal characteristics
  • Independent variables
  • Group and sample size
  • Experimental unit
  • Inclusion and exclusion
  • Intervention
  • Measurement
  • Overview and demonstration of the EDA
  • Getting the most out of the EDA
  • What is the experiment diagram?
  • Troubleshooting

The Experimental Design Assistant

A free resource from the NC3Rs  used by over 5,000 researchers worldwide to help you design robust experiments more likely to yield reliable and reproducible results.

The EDA helps you build a diagram representing your experimental plan, which can be critiqued by the system to provide bespoke feedback . The EDA also:

  • Recommends statistical analysis methods
  • Provides support for randomisation and blinding
  • Performs sample size calculations

For an overview of how the EDA works, watch our 1 minute video.

The EDA website also provides information about the different concepts of  experimental design , and how to apply these in your experiments.

experimental design anova

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Repeated Measures Designs: Benefits and an ANOVA Example

By Jim Frost 25 Comments

Repeated measures designs, also known as a within-subjects designs, can seem like oddball experiments. When you think of a typical experiment, you probably picture an experimental design that uses mutually exclusive, independent groups. These experiments have a control group and treatment groups that have clear divisions between them. Each subject is in only one of these groups.

These rules for experiments seem crucial, but repeated measures designs regularly violate them! For example, a subject is often in all the experimental groups. Far from causing problems, repeated measures designs can yield significant benefits.

In this post, I’ll explain how repeated measures designs work along with their benefits and drawbacks. Additionally, I’ll work through a repeated measures ANOVA example to show you how to analyze this type of design and interpret the results.

To learn more about ANOVA tests, read my ANOVA Overview .

Drawbacks of Independent Groups Designs

To understand the benefits of repeated measures designs, let’s first look at the independent groups design to highlight a problem. Suppose you’re conducting an experiment on drugs that might improve memory. In a typical independent groups design, each subject is in one experimental group. They’re either in the control group or one of the treatment groups. After the experiment, you score them on a memory test and then compare the group means.

In this design, you obtain only one score from each subject. You don’t know whether a subject scores higher or lower on the test because of an inherently better or worse memory. Some portion of the observed scores is based on the memory traits of the subjects rather than because of the drug. This example illustrates how people introduce an uncontrollable factor into the study.

Imagine that a person in the control group scores high while someone else in a treatment group scores low, not due to the treatment, but due to differing baseline memory capabilities. This “fuzziness” makes it harder to assess differences between the groups.

If only there were some way to know whether subjects tend to measure high or low. We need some way of incorporating each person’s variability into the model. Oh wait, that’s what we’re talking about—repeated measures designs!

How Repeated Measures Designs Work

As the name implies, you need to measure each subject multiple times in a repeated measures design. Shocking! They are longitudinal studies. However, there’s more to it. The subjects usually experience all of the experimental conditions, which allow them to serve as experimental blocks or as their own control. Statisticians refer to this as dependent samples because one observation provides information about another observation. What does that mean? Let me break this down one piece at a time.

The effects of the controllable factors in an experiment are what you really want to learn. However, as we saw in our example above, there can also be uncontrolled sources of variation that make it harder to learn about those things that we can control.

Experimental blocks explain some of the uncontrolled variability in an experiment. While you can’t control the blocks, you can include them in the model to reduce the amount of unexplained variability. By accounting for more of the uncontrolled variability, you can learn more about the controllable variables that are the entire point of your experiment.

Let’s go back to our longitudinal study for the drug’s effectiveness. We saw how subjects are an uncontrolled factor that makes it harder to assess the effects of the drugs. However, if we took multiple measurements from each person, we gain more information about their personal outcome measures under a variety of conditions. We might see that some subjects tend to score high or low on the memory tests. Then, we can compare their scores for each treatment group to their general baseline.

And, that’s how repeated measures designs work. You understand each person better so that you can place their personal reaction to each experimental condition into their particular context. Repeated measures designs use dependent samples because one observation provides information about another observation.

Related posts : Independent and Dependent Samples and Longitudinal Studies: Overview, Examples & Benefits .

Benefits of Repeated Measures Designs

In statistical terms, we say that experimental blocks reduce the variance and bias of the model’s error by controlling for factors that cause variability between subjects. The error term contains only the variability within-subjects and not the variability between subjects. The result is that the error term tends to be smaller, which produces the following benefits:

Greater statistical power : By controlling for differences between subjects, this type of design can have much more statistical power . If an effect exists, your statistical test is more likely to detect it.

Requires a smaller number of subjects: Because of the increased power, you can recruit fewer people and still have a good probability of detecting an effect that truly exists. If you’d need 20 people in each group for a design with independent groups, you might only need a total of 20 for repeated measures.

Faster and less expensive: The time and costs associated with administering repeated measures designs can be much lower because there are fewer people to recruit, train, and compensate.

Time-related effects: As we saw, an independent groups design collects only one measurement from each person. By collecting data from multiple points in time for each subject, repeated measures designs can assess effects over time. This tracking is particularly useful when there are potential time effects, such as learning or fatigue.

Managing the Challenges of Repeated Measures Designs

Repeated measures designs have some great benefits, but there are a few drawbacks that you should consider. The largest downside is the problem of order effects, which can happen when you expose subjects to multiple treatments. These effects are associated with the treatment order but are not caused by the treatment.

Order effects can impede the ability of the model to estimate the effects correctly. For example, in a wine taste test, subjects might give a dry wine a lower score if they sample it after a sweet wine.

You can use different strategies to minimize this problem. These approaches include randomizing or reversing the treatment order and providing sufficient time between treatments. Don’t forget, using an independent groups design is an efficient way to eliminate order effects.

Crossover Repeated Measures Designs

I’ve diagramed a crossover repeated measures design, which is a very common type of experiment. Study volunteers are assigned randomly to one of the two groups. Everyone in the study receives all of the treatments, but the order is reversed for the second group to reduce the problems of order effects. In the diagram, there are two treatments, but the experimenter can add more treatment groups.

Diagram of a crossover repeated measures design.

Studies from a diverse array of subject areas use crossover designs. These areas include weight loss plans, marketing campaigns, and educational programs among many others. Even our theoretical memory pill study can use it.

Repeated measures designs come in many flavors, and it’s impossible to cover them all here. You need to look at your study area and research goals to determine which type of design best meets your requirements. Weigh the benefits and challenges of repeated measures designs to decide whether you can use one for your study.

Repeated Measures ANOVA Example

Let’s imagine that we used a repeated measures design to study our hypothetical memory drug. For our study, we recruited five people, and we tested four memory drugs. Everyone in the study tried all four drugs and took a memory test after each one. We obtain the data below. You can also download the CSV file for the Repeated_measures_data .

Images that displays the data for the repeated measures ANOVA.

In the dataset, you can see that each subject has an ID number so we can associate each person with all of their scores. We also know which drug they took for each score.  Together, this allows the model to develop a baseline for each subject and then compare the drug specific scores to that baseline.

How do we fit this model? In your preferred statistical software package, you need to fit an ANOVA model like this:

  • Score is the response variable.
  • Subject and Drug are the factors,
  • Subject should be a random factor .

Subject is a random factor because we randomly selected the subjects from the population and we want them to represent the entire population. If we were to include Subject as a fixed factor, the results would apply only to these five people and would not be generalizable to the larger population.

Drug is a fixed factor because we picked these drugs intentionally and we want to estimate the effects of these four drugs particularly.

Repeated Measures ANOVA Results

After we fit the repeated measures ANOVA model, we obtain the following results.

Output for repeated measures ANOVA.

The P-value for Drug is 0.000. This low P-value indicates that all four group means are not equal. Because the model includes Subjects, we know that the Drug effect and its P-value accounts for the variability between subjects.

Below is the main effects plot for Drug, which displays the fitted mean for each drug.

Main effects plot for the repeated measures ANOVA example.

Clearly, drug 4 is the best. Tukey’s multiple comparisons (not shown) indicate that Drug 4 – Drug 3 and Drug 4 – Drug 2 are statistically significant.

Have you used a repeated measures design for your study?

Share this:

experimental design anova

Reader Interactions

' src=

December 15, 2023 at 2:24 pm

thanks for these posts and comments. question – in a repeated measures analysis within SPSS, the first output is the multivariate effect, and the second is the within-subjects effect. I imagine both analyses approach the effect from a different point of view. I’m trying to understand the difference, similarity, when to use multivariate vs. within-subjects. My data has three time points. one between-subjects factor.

' src=

November 30, 2022 at 11:14 am

Hi Jim – Thank you for your posts, which are always comprehensive and value-adding.

If my subjects are not individual respondents, but an aggregated group of respondents in a geography (example: respondents in a geographic area forms my subjects G1, G2, …,Gn), do I need to normalize the output variable to handle the fluctuation across the subjects due to population variations across geographies? Or will the Repeated Measures ANOVA handle that if I add Subject (Geography) as my factor?

' src=

September 26, 2022 at 6:37 am

Hi and thank you for a calrifying page! But I still haven’t found what I’m looking for… I have conducted a test with 2 groups, approx 25 persons randomly allocated to each group. They were given two different drug treatments. We measured several variables before the drug was given. After the drug was given, the same variables were measured after 1, 5, 20 and 60 minutes. Let’s say these variables were AA, BB, CC, DD and EE. Let’s assume they are normally distributed at all times. Variable types are Heart Rate, Blood Pressure, and such. How am I supposed to perform statistics in this case? Just comparing drug effects at each time point will inevitably produce Type I errors? These are Repeated Measurements but is really R.M. ANOVA appropriate here?

' src=

September 26, 2022 at 8:28 pm

Hi Tony, yes, I think you need to use repeated measures MANOVA. That should allow you to accomplish all that while controlling Type I errors by avoiding multiple tests.

' src=

August 3, 2022 at 3:56 am

Hi Jim, I have 3 samples(say A, B&C) that being tasted and rated in hedonic scale by panelists. Each panelist will be given 3 samples(one at a time) to be tasted or evaluated. A total of 100 respondents are selected from particular population. can repeated measure ANOVA be used? this is consider related right? if not, can you suggest the appropriate test to use.

' src=

June 25, 2022 at 1:06 am

I’m very mathematically challenged and your posts really simplify things. I’m trying to help a singer determine which factors interactively determine his commission during livestreams, as the commission is different each time. For each date, I have the amount of coins gifted from viewers, the average viewer count, the total number of viewers, and the average time viewers have spent watching. The dependent variable is the commission. Would I use an ANOVA for this?

' src=

December 7, 2021 at 11:24 pm

Hi Jim Please if I have the following data, which Test is most appropriate Comparing mean of BMI, diastolic pressure, cholesterol between two age groups (15 to 30) and above 30 years? Thank you

December 9, 2021 at 6:21 pm

Hi Salma, I’m not sure about your IV and DVs. What’s your design? I can’t answer your question without knowing what you want to test. What do you want to learn using those variables?

' src=

October 25, 2021 at 4:32 pm

Jim, Isn’t there a sphericity requirement for data in repeated measures anova?

October 25, 2021 at 10:48 pm

Spherical errors are those that have no autocorrelation and have a constant variance. In my post about OLS assumptions , they’re assumptions #4 and #5 with a note to that effect in the text for #5. It’s a standard linear models assumption.

' src=

May 6, 2021 at 5:09 pm

Hi Jim, we have data from analysis of different sources of gluten free flour analysed together and compared to wheat flour for different properties. What would be the best test to use in this case please.

' src=

September 14, 2020 at 6:41 pm

Hi Jim, I found you post helpful and was wondering if the repeating measures ANOVA would be an appropriate analysis for a project I am working on. I have collected pre, post, and delayed post survey data. All participants first complete a pre survey, then engage in an intervention, direct after the intervention they all complete a post survey. Then 4 months later they all complete a delayed post survey. My interest is to see if there are any long-term impact of the intervention. Would the repeating measures ANOVA be appropriate to use to compare the participants’ pre, post, and delayed post scores?

' src=

June 12, 2020 at 8:28 pm

Thank you for another great post! I am doing a study protocol and the primary hypothesis is that a VR intervention will show improvement in postural control (4 CoP parameters), comparing the experimental and inactive control group (post-intervention). I was advised to use a repeated measures ANOVA to test the primary hypothesis but reading your post made me realize that might not be correct because my study subjects are not experiencing all the experimental conditions. Do you recommend another type of ANOVA?

Thanks in advance.

June 12, 2020 at 9:07 pm

I should probably clarify this better in the post. The subject don’t have to experience all the treatment conditions, but many studies use these designs for this reason. But, it’s not a requirement. If you’ve measured your subjects multiple times, you probably do need to use a repeated measures design.

' src=

June 2, 2020 at 11:31 am

Thank you so much for your helpful posts about statistics! I’ve tried doing a repeated measures analysis but have gotten a bit confused. I administered 3 different questionnaires on social behavior (all continuous outcomes, but on different scales [two ranging 0-50, the third 0-90]) on 4 different time points. The questionnaires are correlated to each other so I would prefer to put them in the same analysis. I was planning on doing this by making one within subject variable “time” and one within subject variable “questionnaire”. I would like to know what the effect is of time on social behavior and whether this effect is different depending on the specific questionnaire used. Is it ok to add these questionnaires in the same analysis even though they do not have the same range of scores or should I first center the total scores of the questionnaires?

Many thanks, Laura

June 3, 2020 at 7:44 pm

ANOVA can handle DVs that use different measurement units/scales without problems. However, if you want to determine which DV/survey is more important, you might consider standardizing them. Read more about that in my post about identifying the most important variables in your model . It discusses it in the regression context but the same applies to ANOVA.

You’ll obtain valid and consistent results using either standardized and unstandardized values. It just depends on what you want to learn.

I hope that helps!

' src=

May 30, 2020 at 4:53 pm

Hi Jim, thanks for your effort and time to make statics understandable to the wider public. Your style of teaching is quite simple.

I didn’t any questions nor responses for 2019 to data, but I hope you’re still there anyway.

I have this stat problem I need your opinion on. There are 8 drinking water wells clustered at different distances around an injection well. To simulate direction and concentration of contaminant within subsurface around the well area, a contaminant was injected/pumped continuously into the subsurface through the injection well. This happened for 6 weeks; pH samples were taken from the 8 wells daily for the 6 weeks. I need to test for 2 things, namely: 1. Is there any significant statistical difference in pH within the wells within the 6 weeks (6 weeks as a single time period) 2. Is there any statistical significant difference in pH for each well within the weeks (6 weeks time step)

Which statistical test best captures this analysis? I think of repeated measure ANOVA, what do you think please?

May 30, 2020 at 4:55 pm

Yes, because you’re looking at the same subjects (wells) over time, you need repeated measures ANOVA.

' src=

December 24, 2018 at 2:31 pm

Name: Vidya Kulkarni

Email: [email protected]

Comment: Shall appreciate a reply. My friend has performed experiments with rats in 3 groups by administering certain drug. Group 1 is not given any drug, Group 2 is given 50 mg and group 3 is given 100 mg. In each group there are 3 rats and for each of these rats their their tumor volume has been recorded for 9 consecutive days. Thus for each group we have 27 observations. We want to show the difference in their means is significantly different at some confidence level. Please let me know what statistical test should we use and if you can send a link to some similar example, that would be a great help. Looking forward to quick help. Thanks

' src=

December 11, 2018 at 8:44 pm

I wanted to tank you for your post! It was really helpful for me. In my design I have 30 subjects with 10 readings (from different electrodes on the scalp) for each subject in two sessions (immediate test, post test). I used repeated measure anova and I found a significant main effect of sessions and also significant interaction of sessions and electrodes. Main effect means I have significant difference between session1 data and session2 data but I am not sure about the interaction effect. I would appreciate if you help me with that.

Thanks, Mary

December 12, 2018 at 9:32 am

I’m not sure what your outcome variable is or what the electrodes variable measures precisely. But, here’s how you’d interpret the results generally.

The relationship between sessions and your outcome variable depends on the value of your electrodes variable. While there is a significant difference between sessions, that difference depends on the value of electrodes. If you create an interactions plot, it should be easier to see what is going on! For more information, see my post about interaction effects .

' src=

October 23, 2018 at 4:12 pm

Hello Jim ! I am very pleased to meet you and I greatly appreciate your work !

The Repeated Measures ANOVA that I have encountered in my study is as follows :

A number of subject groups, of n – people each, selected e.g by age, are tested repeatedly for the same number of times all, with the same drug ! I.e there is only one drug !

The score is the effectiveness of the drug on a specific body parameter, e.g on blood pressure. And the question is to assess the efectiveness of the drug.

Subjects group is not a random factor, as it is an age group Score also is not an independent r.v as it reflects the effect of the previous day of the drug

Do you have any notes on this type of problems or recommend a literature I can access from web ?

My best regards Elias Athens / Greece

October 24, 2018 at 4:26 pm

It’s OK to not have more than one drug. You just need to be able to compare the one drug to not taking the drug. You can do that both in a traditional control group/treatment group setting or by using repeated measures. However, given that you talk about repeated measures and everyone taking the drug, my guess is that it is some type of crossover design, which I describe in this post.

In this scenario, everyone would eventually take the same drug over the course of the study, but some subjects might start out by not taking the drug while the other subjects do. Then, the subjects switch.

You can include Subjects as a random factor if you randomly selected them from them population. Then, include Age as an additional Fixed factor if you’re specifying the age groups or as a covariate if you’re using their actual age (rather than dividing them into groups based on age ranges).

I hope this helps!

' src=

August 27, 2018 at 2:24 pm

I am getting conflicting advice. I ran a: pre-test, intervention, post-test study. Where I had 4 groups (3 experimental and one control). I tested hamstring strength. In my repeated measures ANOVA I had an effect of time but NO interaction effect. I have been told due to no interaction effect I do NOT run a post-hoc analysis. Is this correct as someone else has told me the complete opposite (I only run a post-hoc analysis when I do not have an interaction effect)?

August 28, 2018 at 11:11 pm

The correct action to do depends on the specifics of your study, which might be why you’re getting conflicting advice!

As a general statistical principle, it’s perfectly fine to perform post-hoc tests regardless of whether the interaction effect is significant or not. The only time that it makes no sense to perform a post hoc test is when no terms in your model are statistically significant. Although, even in that case, post hoc tests can sometimes detect statistical significance–but that’s another story. But, in a nutshell, you can perform post hoc tests whether or not your interaction term is significant.

However, I suspect that the real question is whether it makes sense the pre-test post-test nature of your study. You have measurements before and after the intervention. If the intervention is effective, you’d expect the differences to show up after the intervention but not before. Consequently, that is an interaction effect because it depends on the time of measurement. Read my blog post about interaction effects to see how these are “it depends” effects. So, if your interaction effect is not significant, it might not make sense to analyze your data further.

If the main effect for the treatment group variable is significant but not the interaction effect, it’s a bit difficult because it says that the treatment groups cause a difference between group means even in the pre-test measurement! That might represent only the differences between the subjects within those groups–it’s hard to say. You really want that interaction term to be significant!

If only the time effect is significant and nothing else, it’s probably not worth further investigation.

One thing I can say definitively is that the person who said that you can only perform a post-hoc analysis when the interaction is not significant is wrong! As a general principle, it’s OK to perform post-hoc analyses when an interaction term is significant. For your study, you particularly want a significant interaction term!

Comments and Questions Cancel reply

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Beyond t-Test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research

1. Department of Statistics, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3425

7. The Center for Neural Circuit Mapping, University of California, Irvine, Irvine, CA 92697

Michele Guindani

Steven f. grieco.

2. Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, CA 92697-1275

Todd C. Holmes

3. Department of Physiology and Biophysics, School of Medicine, University of California, Irvine, CA 92697-4560

Xiangmin Xu

4. Department of Biomedical Engineering, University of California, Irvine, CA 92697-2715

5. Department of Microbiology and Molecular Genetics, University of California, Irvine, Irvine, CA 92697-4025

6. Department of Computer Science, University of California, Irvine, Irvine, CA 92697-3435

Author contributions

Associated Data

The data that support the findings of this study are available from the corresponding author upon reasonable request.

In basic neuroscience research, data are often clustered or collected with repeated measures, hence correlated. The most widely used methods such as t-test and ANOVA do not take data dependence into account, and thus are often misused. This Primer introduces linear and generalized mixed-effects models that consider data dependence, and provides clear instruction on how to recognize when they are needed and how to apply them. The appropriate use of mixed-effects models will help researchers improve their experimental design, and will lead to data analyses with greater validity and higher reproducibility of the experimental findings.

1. Overview

The importance of using appropriate statistical methods for experimental design and data analysis is well recognized across scientific disciplines. The growing concern over reproducibility in biomedical research is often referred to as a “problem of inadequate rigor” ( Kilkenny et al., 2009 ; Prinz et al., 2011 ). The reproducibility crisis has been attributed to various factors that include lack of adherence to good scientific practices, underdeveloped experimental designs, and the misuse of statistical methods ( Landis et al., 2012 ; Steward and Balice-Gordon, 2014 ). Further compounding these challenges, we are in the midst of an ever-expanding biomedical research revolution. “Big Data” are being produced at an unprecedented rate ( Margolis et al., 2014 ). Proper analysis of Big Data requires up-to-date statistical methodologies that take complex features of data such as explicit and implicit data dependencies into consideration. Better matching of statistical models that take data characteristics into account will allow for better interpretation of data outcomes. It will also boost the confidence in biomedical research of all stakeholders in the scientific enterprise, including the industry and the taxpaying public ( Alberts et al., 2014 ; Freedman et al., 2015 ; Macleod et al., 2014 ). Despite recent advances in statistical methods, current neuroscience research is often conducted using a limited set of well-known statistical tools. Many models and tests assume that the observations are independent of each other. Failure to account for this dependency in the data often leads to an increased number of false positives, a major cause of the irreproducibility crisis ( Aarts et al., 2014 ).

The t-test and ANOVA are familiar methods to all neuroscience researchers. Both methods assume that individual observations are independent of each other. For example, data measurements from multiple mice observed under different conditions, e.g., different mouse genetic models, are taken to be unique. However, this assumption of independence is false for animals clustered into cages or litters and for neuroanatomical and neurophysiological studies that rely on large scale longitudinal recordings and involve repeated measurements over time of the same neurons and/or animals ( Aarts et al., 2014 ; Galbraith et al., 2010 ; Wilson et al., 2017 ). In those cases, data are structured as clusters of multiple measurements collected from single units of analyses (neurons and/or animals), leading to natural dependence and correlation between the observations ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1756144-f0001.jpg

A graphical representation shows potential sources of correlated data. (A) The data are correlated because neurons from the same animal tend to be more similar to each other than neurons from different animals. (B) The observations are dependent when they are taken from the same animal temporally, while the data from different animals are independent. (C) Correlation arises from two sources: individual observations are made from neurons from three different mice before and after a drug treatment.

A quick examination of recently published articles indicate that reported results in basic neuroscience research often use inappropriate statistical methods for which the experimental designs and the ensuing/resulting data dependencies are not taken into account ( Aarts et al., 2014 ; Boisgontier and Cheval, 2016 ). Our conclusion is supported by our survey of the studies published in prestigious journals over the past few years. In total we identified over 100 articles where recordings of individual neurons from multiple animals were pooled for statistical testing. Alarmingly, about 50% of these articles accounted for data dependencies in any meaningful way. Our finding is in agreement with an investigation published a few years ago ( Aarts et al., 2014 ), which found that 53% of neuroscience articles failed to account for the dependence in their data. Representative descriptions of the inappropriate analyses read, “t(28656)=314 with p<10 −10 over a total of n=28657 neurons pooled across six mice”, “n=377 neurons from four mice, two-sided Wilcoxon signed rank test …”, “610 A cells, 987 B cells and 2584 C cells from 10 mice, one-way ANOVA and Kruskal–Wallis test”, “two-sided paired t-test, n=1597 neurons from 11 animals, d.f.=1596”, among numerous others. Such analyses can lead to astonishingly high type I error (false positive) rates (see below). Even in cases for which data dependencies are obvious, investigators continue to use repeated ANOVA, paired t-test, or their nonparametric versions. In many cases, errors due to the use of inappropriate statistics affect the article’s main conclusion ( Fiedler, 2011 ).

Statisticians have developed effective methods to analyze correlated data. Several widely used statistical tools that take data dependencies into account are the linear and generalized mixed-effects models, which include t-test and ANOVA as special cases. Although the value of analyzing correlated data has been increasingly recognized in many scientific disciplines, including clinical research, genetics, psychological science studies, mixed-effects (ME) models have been under-utilized in basic neuroscience research.

The purpose of our article is to provide a readable primer to neuroscience experimentalists, who do not have extensive training in statistics. We illustrate and discuss what features of the experimental questions require an appropriate consideration of adequate design and data structure, and how the proper use of mixed-effects models will lead to more rigorous analysis, reproducibility and richer conclusions. We provide concrete data examples on how to properly use mixed-effects models. In addition to providing an improved perspective on appropriate statistical analyses, we provide easy-to-follow instructions for the implementation of mixed-effects models with access to code and practice data sets to all interested users. See Glossary Box 1 for a useful glossary related to this Primer.

Clustered data : In neuroscience research, the data from a study are often obtained from a number of different experimental units (referred to as clusters). The key feature of clustered data is that observations from the same cluster tend to be correlated with each other.

Dependent versus independent : For dependent samples, the selection of subjects for consideration (e.g., neurons, animals) in one sample is affected by the selection of subjects in the other samples. For independent samples, the selection of subjects for consideration (e.g., neurons, animals) is not affected by the selection of subjects in the other sample.

Effect size : An effect size is a numerical quantity for the magnitude of a certain relationship such as the difference between population means or the association between two quantitative variables.

Fixed versus random effects : Fixed effects often refer to fixed but unknown population parameters such as coefficients in the traditional linear model (LM). Random effects often refer to effects at the individual or subject level that are included in the model to take into account the heterogeneity/variability of individual observations but are usually not of direct interest.

Frequentist versus Bayesian approaches in mixed-effects models : In frequentist analysis, a fixed effect is a fixed but unknown population parameter, whereas a random effect is a value drawn from a distribution to capture individual variability. In Bayesian analysis, both fixed and random effects are random variables drawn from distributions (priors); the inference is conducted by computing the posterior distribution for the fixed effects and the variance-covariance of the random effects. The posterior distribution updates the prior information using the observed data.

Hypothesis testing : A hypothesis is a statement about a parameter (or a set of parameters) of interest. Statistical hypothesis testing is formalized to make a decision between rejecting or not rejecting a null hypothesis, on the basis of a set of experimental observations and measurements. Two types of errors can result from any decision rule (test): 1) rejecting the null hypothesis when it is true (a Type I error, “false positive”), and 2) failing to reject the null hypothesis when it is false (a Type II error, “false negative”).

Independently and identically distributed : A set of random variables are independently and identically distributed (i.i.d.) if they are mutually independent and each of them follows the same distribution.

Linear regression model (or linear model) : A linear regression model is an approach to model the linear relationship between a response variable and one or more explanatory variables.

Linear mixed-effects model (LME) and generalized linear mixed model (GLMM) : The LME is an extension of the linear regression model to consider both fixed and random effects. It is particularly useful when the data are clustered or have repeated measurements. The GLMM is an extension to the generalized linear model, in which the linear predictor contains random effects in addition to the fixed effects.

Parameters : Parameters are the characteristic values of an entire population, such as the mean and standard deviation of a normal distribution. Samples can be used to estimate population parameters.

Parametric versus nonparametric tests : A parametric test assumes that the data follow an underlying statistical distribution. A nonparametric test does not impose a specific distribution on data. Nonparametric tests are more robust than parametric tests as they are valid over a broader range of situations.

2. Introduction to linear and generalized linear mixed-effects (LME, GLMM) models

2.1. important concepts and definitions related to statistical testing.

To understand the practical issues of mixed-effects models in the context of neuroscience research, we next introduce several important concepts and definitions using real-world data illustrations. Considering 5000 cells measured from five mice, what is the effective sample size in this study? Is it 5000 or 5? Perhaps it is neither. The number of biological units, experimental units, and observational units can be quite distinct from each other. A detailed discussion of sample size in cell cultures and animal experiments is provided by an earlier paper ( Lazic et al., 2018 ). Here we will use an example data set collected from our laboratory to illustrate the concept and definition of intra-class-correlation (ICC), which is a metric to quantify the degree of correlation due to clustering. We also introduce the concepts of “design effect” and “effective sample size,” and discuss why conventional methods such as t-test and ANOVA are not appropriate for this example.

ICC is a widely used metric to quantify the degree to which measurements from the same group are correlated. Depending on the specific settings that are concerned, different definitions have been proposed. For simplicity, let us consider the simple one-way ANOVA setting, where each animal is considered as a class. The total variance of data can be partitioned into the between- (inter-) and within- (intra-) class variances. The population ICC ( Fischer, 1944 ) is defined as the ratio of the between-class variance to the total variance:

where σ b 2 denotes the between-class variance and σ e 2 denotes the within-class variance. For naturally occurring clusters, ICC often falls between 0 and 1. If ICC=0, the data can be treated as uncorrelated; if ICC=1, all the observations in each cluster are perfectly correlated.

In our study of ketamine effects on neuroplasticity (Example 1, see detailed information in Section 3 below), we measured pCREB immunoreactivity of 1200 putative excitatory neurons of mouse visual cortex at different time points: collected at baseline (saline), 24, 48, 72 hours, and 1 week following ketamine treatment from 24 mice ( Figure 2 ). The original data and full description of the experiments can be found in ( Grieco et al., 2020 ). For this example, a large ICC suggests that neurons from the same mouse tend to be more similar to each other than neurons from different mice. For larger values of ICC, there is greater homogeneity within clusters and greater heterogeneity between clusters. As shown in Figure 2 , the pCREB values of the 357 neurons in the saline group tend to cluster into groups indexed by the seven mice. The estimated ICC ( Wolak and Wolak, 2015a ) is 0.61, which implies that the 357 observations should not be treated as independent data points.

An external file that holds a picture, illustration, etc.
Object name is nihms-1756144-f0002.jpg

Normalized pCREB staining intensity values from 1200 neurons (Example 1). The values in each cluster were from one animal. In total, pCREB values were measured for 1200 neurons from 24 mice at five conditions: saline (7 mice, ICC=0.61), 24h (6 mice, ICC=0.33), 48h (3 mice, ICC=0.02), 72h (3 mice, ICC=0.63), 1wk (5 mice, ICC=0.54) after treatment. According to ICC, observations at 48h and 72h show the smallest and largest intra-class correlations, respectively.

To understand why conventional methods (t-test, ANOVA) fail when data dependencies are not taken into account, it is helpful to quantify the magnitude of clustering of an experiment using the design effect ( D eff ) ( Kish, 1965 ), which is defined as:

where M denotes the average cluster size of an experiment design. It is a useful metric to recalibrate the standard error of an estimate in the presence of clustering or adjusting sample size when designing an experiment. For the saline group, with n =357 and ICC=0.61, the design effect is 32, i.e., on average 32 neurons under the current design are equivalent to one uncorrelated neuron. This experimental design may call for more measurements, but how many should be made?

Another closely related concept that helps answer this question is the effective sample size ( n eff ), which is the equivalent sample size if there is no clustering / correlation. It is defined as n eff = n / D eff , where n is the total sample size (number of observations). This definition is also an interpolation of the two extreme cases of ICC=0 or 1, with ICC=0 leading to n eff = n (no correlation) and ICC=1 leading to n eff = n / M (complete correlation). In sample size calculations, the design effect can be interpreted as a multiplying factor to obtain the desired sample size under the assumption of independence. With D eff =32 in the saline example, the effective sample size based on the 357 neurons is n eff =357/32≈11, which is only about 50% more than the number of mice. The ICC, design effects, and effective sample sizes for the five groups are shown in Table 1 . The results indicate that there is substantial dependence in data. Unfortunately, when researchers analyze data under such circumstances, the methods they choose often make the wrong assumption that all the observations are independent from each other. One well known consequence of ignoring correlations in data is an increased number of false positives, which will be discussed in the next sub-section.

ICC, design effect, and effective sample size for the five groups in Example 1. ICC and the design effect were the lowest at 48h, when the data were relatively homogeneous across animals. At baseline and 72h, the data are noticeably heterogeneous across animals, leading to high ICC.

Saline (7 mice)24h (6 mice)48h (3 mice)72h (3 mice)1wk (5 mice)
# of cells357209139150245
ICC0.610.330.020.630.54
Design effect32.017.71.831.826.8
Effective sample size11.117.576.94.79.1

2.2. Failing to account for data dependence leads to high type I error (false positive) rates

When dependence is ignored in the data analysis, null hypotheses can be erroneously rejected and confidence intervals do not have enough coverage. In the statistical literature, the action of erroneously rejecting a null hypothesis (see Glossary Box 1 and Appendix 1 of the Supplemental Materials ) is called a “false positive”. For a given test, its size, or type I error rate, is defined as the probability that the null hypothesis is erroneously rejected. We say that a test has an inflated type I error rate when its type I error rate is greater than its significance level, which is often denoted as α . To evaluate the severity of inflated type I error rates due to failure to consider data dependencies in realistic scenarios, we simulated data using the dependence structure of Example 1. The number of neurons from each of the 24 animals, the number of animals from each of the five groups, and the ICCs from Example 1, illustrated in Figure 2 and Table 1 , were used to generate simulated data. To ensure that the data were simulated under the null hypothesis, the responses in each of the five groups were simulated from a multivariate normal distribution with mean 0 and correlation structure based on the ICC of that group. Thus, the a priori known ground truth is that the five groups (baseline (saline), 24, 48, 72 hours, and 1 week) share the same population mean.

We simulated 10,000 data sets, and each of which were analyzed using the linear model (LM) by pooling all neurons, or were analyzed using the linear mixed-effects (LME) model, to test equal population means of the five groups. The histogram of LM p-values indicates most of the p-values are small ( Figure 3a , left panel); the type I error rate is about 90% when α = 0.05 is used. Thus, with no difference between the five groups, the probability that LM will reject the null hypothesis is 90%. This strikingly large type I error rate of LM confirms that when substantial data dependency exists, the cost of failure to take data dependency into account is very serious due to the higher probability of false positives.

An external file that holds a picture, illustration, etc.
Object name is nihms-1756144-f0003.jpg

Histograms of p-values using simulated data that assume (1) no treatment effects and (2) the same sample sizes and correlation structure with Example 1. (A) Histogram of the p-values from the inappropriate method (LM) shows that ignoring the correlation structure of the data lead to surprisingly high type I error rate (90%) at significance level α=0.05. (B) Histogram of the p-values from LME.

In comparison, the histogram of LME p-values is approximately uniform between 0 and 1 ( Figure 3b , right panel); if the significance level is chosen at α =0.05, the estimated type I error rate is 8.6%, which indicates that the LME test is effective in accounting for data dependency. This convincingly illustrates the need for use of the LME in neuroscience research. Next, we provide some background and describe the method of linear mixed-effects model (LME).

2.3. Linear mixed-effects model

The word “mixed” in linear mixed-effects (LME) means that the model consists of both fixed and random effects. Fixed effects refer to fixed but unknown coefficients for the variables of interest and the explanatory covariates, as identified in the traditional linear model (LM) developed by Francis Galton more than a century ago. Random effects, first proposed in ( Fisher, 1919 ), refer to variables that are not of direct interest - however, they may potentially lead to correlated outcomes. A major difference between fixed and random effects is that the fixed effects are considered as unknown parameters whereas the random effects are considered as random variables drawn from a distribution (e.g., a normal distribution). LME was pioneered by C. R. Henderson in his series of work on animal breeding ( Henderson, 1949 ). It is now widely accepted and has been successfully applied in various scientific disciplines such as economics, genetics, psychology, medicine and sociology ( Fitzmaurice et al., 2012 ; Jiang and Nguyen, 2021 ; Laird and Ware, 1982 ). Depending on the disciplines and application domains, alternative names have been used for LME, including random-effects model, multi-level model, hierarchical model, and variance component model. In order to apply LME, it is necessary to understand its assumptions and representation in sufficient detail, especially with respect to simpler methods. We start by reviewing the two-sample t-test, one-way ANOVA, and the linear model, and then introduce the linear mixed-effects model.

2.3.1. Background: Two-sample t-test, one-way ANOVA, and linear model

We start from the familiar two-sample case with n 0 observations ( Y 1 , …, Y n 0 ) from a control group and n 1 observations from a treatment group ( Y n 0+1 , …, Y n 0+ n 1 ). Under independence and normality assumptions, the t-test statistic, which standardizes the difference of the sample means by its standard error, follows a t distribution. Equivalently, one can use a simple linear model (LM) to model the difference between treatment and control.

Let x i denote a covariate (predictor) variable such that x i = 1 if the observed outcome Y i is from a subject assigned to the treatment group and x i =0 otherwise. Then, we can assume a linear relationship between the outcome and the treatment assignment as follows:

In this model, β 0 is the mean of the control group and ( β 0 + β 1 ) is the mean of the treatment group. The null hypothesis of no effect of the treatment versus control is expressed as H 0 : β 1 = 0 and the test statistic of the well-known t-test is identical to the least square estimate of the coefficient β 1 divided by its standard error. The Ԑ i is the random error term. The generalization from one treatment to p treatments is e p indicator variables, also known as dummy variables, for each of the treatment labels:

where n is the total number of observations. In the above multiple linear regression, β 0 indicates the population mean of the reference group (which is often just the control group). Then each coefficient β k is the difference in population means between the k th treatment and the reference group, since x i,k = 1 if observation i belongs to the k th treatment group and x i,k =0 otherwise. Most often, we are interested in whether there is any difference in population means among all the ( p +1) groups, i.e., H 0 : β 1 = … = β p = 0. If the random errors ( Ԑ i ) are independently and identically distributed (i.i.d.) from a normal distribution, we can use an F-test to assess the null hypothesis H 0 . The same F-test is probably more familiar to practitioners from the one-way analysis of variance (ANOVA). The idea is to decompose the total variance of the data into different sources. The two sources modeled in the multiple linear regression are the variation due to different treatments and the variation due to randomness. The F statistic used in the F-test characterizes the variation due to treatments relative to the variation due to randomness. Thus, ANOVA, in a broad sense, is a method of understanding the contributions of different factors to an outcome variable by studying the proportion of variance explained by each factor ( Gelman, 2005 ).

Unfortunately, ANOVA is frequently misused in neuroanatomical and neurophysiological studies due to a failure of the practitioner to account for the collection of multiple observations from the same animal. Many investigators tend to use the default setup in statistical software or packages, and may not be familiar with more advanced regression framework. Mixed-effects models are a generalization of the previous methods (t-test, ANOVA, LM) and provide researchers with an effective strategy to analyze correlated data by taking dependence into account.

2.3.2. A practical guidance to the linear mixed-effects model

We consider the data in Example 1. The data consist of 1200 observed pCREB immunoreactivity values from 24 mice under five groups, which include the baseline group (7 mice) and 24 hours (6 mice), 48 hours (3 mice), 72 hours (3 mice), and 1 week after ketamine treatment (5 mice), as shown in Table 1 and Figure 2 . Here the data are recorded as multiple measurements from each mouse, which represents a single unit (cluster) of analysis. Let Y ij indicate the j th observation of the i th mouse, and ( x ij,1 , …, x ij,4 ) are the dummy variables for the treatment labels with x ij,1 = 1 for 24 hours, x ij,2 = 1 for 48 hours, x ij,3 = 1 for 72 hours, and x ij,4 = 1 for 1 week after ketamine treatments, respectively. Because there are multiple observations from the same animal, the data are naturally clustered by animal. We account for the resulting dependence by adding an animal-specific effect to the regression framework discussed in the previous section, as follows:

where n i is the number of observations from the i th mouse, u i indicates the deviance between the overall intercept β 0 and the mean specific to the i th mouse, and Ԑ ij represents the deviation in pCREB immunoreactivity of observation (cell) j in mouse i from the mean pCREB immunoreactivity of mouse i . Among the coefficients, the coefficients of the fixed-effects component, ( β 0 , β 1 , β 2 , β 3 , β 4 ), are assumed to be fixed but unknown, whereas ( u 1 , …, u 24 ) are treated as independent and identically distributed random variables from a normal distribution with mean 0 and a variance parameter that reflects the variation across animals. It is important to notice that the cluster/animal-specific means are more generally referred to as random intercepts in an LME. Equivalently, one could write the previous equation by using a vector ( z ij,1 , …, z ij,24 ) of dummy variables for the cluster/animal IDs such that z ij,k =1 for i = k and 0 otherwise:

In the model above, Y ij is modeled by four components: the overall intercept β 0 , which is the population mean of the reference group in this example, the fixed-effects from the covariates ( x ij,1 , …, x ij,4 ), the random-effects due to the clustering ( z ij,1 , …, z ij,24 ), and the random errors Ԑ ij ’s, assumed to be i.i.d. from a normal distribution with mean 0.

In the application of these methods, one practical issue is to determine which effects should be treated as fixed and which should be considered as random. A number of definitions of fixed-effects and random-effects have been given ( Gelman and Hill, 2006 ). It is generally agreed that a fixed effect captures a parameter at the population level; as such, it should be a constant across subjects / clusters. Population-level treatment effects, which are often of direct scientific interest, are included in the fixed-effects. When scientifically relevant, predictors (such as age and gender) whose effects are not expected to change across subjects should also be treated as fixed effects. In contrast, a random effect captures cluster-specific effects (e.g., due to the animal or the cell considered), which are only relevant for capturing the dependence among observations and are typically of no direct relevance for assessing scientific hypotheses. Indeed, the mice in a study are a sample from a large population and they are randomly chosen among all possible mice. Thus, the animal-specific effects are often not of primary interest; hence, they are added to the random-effects component. In Example 1, the mean in pCREB immunoreactivity from a particular mouse is not relevant for the final analysis; however, including the mouse-specific means accounts for the correlation between observations from the same animal.

In addition to cluster-specific means, a linear mixed effects model may include additional terms that describe the variability observed within a cluster (e.g., an animal or cell). Most often, this is the case when measurements are taken at different times from within the same animal and cell and it may be important to account for possibly different cluster-specific trajectories over time. We will discuss this in more detail as it pertains to Example 3 in Section 3 below.

2.3.3. The LME in a matrix format

It is often convenient to write the LME in a very general matrix form, which was first derived in ( Henderson et al., 1959 ). This format gives a compact expression of the linear mixed-effects model:

where Y is an n -by-1 vector of individual observations, 1 is the n -by-1 vector of ones, the columns of X are predictors whose coefficients β , a p -by-1 vector, are assumed to be fixed but unknown, the columns of Z are the variables whose coefficients u, a q -by-1 vector, are random variables drawn from a distribution with mean 0 and a partially or completely unknown covariance matrix, and Ԑ is the residual random error.

In addition to being compact, the matrix form is convenient from a data analysis perspective, since many software packages for LMEs often require that the data are organized according to the so-called “long format”, i.e., each row of the dataset contains only the values for one observation. For example, using the long format, the data in Example 1 can be stored in a matrix with 1200 rows; the dummy variables introduced in Section 2.3.2 for the treatment labels and the cluster / animal IDs are used as the columns for X and Z , respectively. Because many software packages such as Matlab and R can take categorical variables and convert them to dummy variables automatically in their internal computation, the data for Example 1 can be stored in a 1200-by-3 matrix, with the first column being the pCREB immunoreactivity values, the second column being the treatment labels, and the last column being the animal identification numbers (see the Supplemental Materials Part I ).

Since the LME model consists of both fixed and random effects, it is highly versatile and includes the traditional linear regression model (LM), the random effects model, t-test, paired t-test, ANOVA, and repeated ANOVA as special cases. In fact, software implementing the LME model can also be used to implement the LM, ANOVA, two-sample t-test, paired t-test, and other methods. In order to determine whether and which LME model should be used, one needs to understand the sources of correlation. Data visualization, as depicted in Figure 2 , is the first step we recommend to have a good understanding of the data. It is helpful to have a visual inspection of model assumptions, especially regarding whether there is any data dependency due to factors that should be modeled. The decision chart in Figure 4 provides a user friendly guide to determine whether some variables should be included in the matrix Z to model the correlation in animal experiments appropriately. Please also refer to our sections below for implementation details.

An external file that holds a picture, illustration, etc.
Object name is nihms-1756144-f0004.jpg

This basic decision chart shows in a step-wise fashion how to identify the ME application scenarios and random effects.

2.4. Generalized linear mixed-effects model (GLMM)

In this section we discuss how to model data dependency for a broader range of outcome types. Traditional linear models and the LME are designed to model a continuous outcome variable with a fundamental assumption that its variance does not change with its mean. This assumption can be violated for commonly collected outcome variables, such as the choice made in a two-alternative forced choice task (binary data), the proportion of neurons activated (proportional data), the number of neural spikes in a given time window, and the number of behavioral freezes in each session (count data). For example, a natural choice of distribution for count data is the Poisson distribution, for which its mean and variance are equal. This violates the homoscedasticity (meaning “constant variance”) assumption that is a fundamental assumption of a standard linear regression model. In addition, negative predictive values might occur in a linear model, which is undesirable for count or proportional data. These issues can be addressed by the generalized linear model framework, which is an important extension of the linear model.

We first present a unified framework to analyze various outcome types, known as the generalized linear model (GLM) ( McCullagh and Nelder, 2019 ; Nelder and Wedderburn, 1972 ). It includes the conventional linear regression (for continuous variables), logistic regression (for binary outcomes), and Poisson regression (for count data) as special cases. Let Y i be the i th outcome variable and X i =( X i,1 , …, X i,p ) be the corresponding covariates. The critical operation of GLM is to link the expected value of Y i and a linear predictor (i.e., a linear combination of the covariates) through a “link” function g :

The link function g connects the expected mean of the outcome variable to a linear predictor. An equivalent expression is E( Y i | X i )= g −1 ( β 0 + X i,1 x β 1 +… + X i,p x β p ), where g −1 denotes the inverse function of g . For example, the link function of the liner regression model is the identity function, which implies that

To further help understand the link function g , we then consider the situation when the outcome variable is binary, which is often modeled using a logistic regression. Note that a logistic regression is a special GLM with the link function g being the logit function; in other words, we model the logit -transformed success probability using a linear combination of the covariates:

where the success probability πi = E( Y i | X i )=Pr( Y i =1| X i ), with the last equation due to the fact that Y i is either 0 or 1. The logit function ensures that the estimated success probabilities are always between 0 and 1, thus preventing negative predictive values or predictive values greater than 1. To complete the specification of the model, a data generating mechanism for the outcomes is needed. One natural choice is the Bernoulli distribution, i.e.:

The corresponding likelihood function can then be used to make inference on parameters using the maximum likelihood. The distributional assumptions can be relaxed by specifying the relationship between mean and variance, rather than the full distribution, which is expected to have good robustness. This approach is known as the quasi-likelihood method. We refer the interested readers to this publication ( Wedderburn, 1974 ). The GLM generalizes the conventional LM for various types of outcomes by using appropriate link functions and by distributional assumptions of the outcomes. Like the conventional LM, all coefficients in the GLM are assumed to be unknown but fixed parameters. Next, we further extend GLM to generalized linear mixed-effects models so that the data dependence due to the underlying experimental design can be appropriately accounted for by including random effects.

To account for data dependency, the GLM has been extended to the generalized linear mixed-effects model (GLMM) ( Breslow and Clayton, 1993 ; Liang and Zeger, 1986 ; Stiratelli et al., 1984 ; Wolfinger and O’connell, 1993 ; Zeger and Karim, 1991 ; Zeger and Liang, 1986 ):

The random-effects terms in LME (equation 4 ) and GLMM (equation 10 ) play the same role; they explicitly model the dependence structure by specifying subject-specific or other relevant random effects and their joint distribution. With appropriate assumptions on the distribution of the outcome variables Y ij ’s and the mean assumption specified in equation (10) , likelihood-based approaches are often used for parameter estimation. Compared to LME, the computation involved in GLMM with non-normal data is substantially more challenging, both in computational speed and stability. As a result, several strategies have been developed to approximate the likelihood ( Bolker et al., 2009 ).

A robust alternative is the generalized estimating equation (GEE) ( Zeger et al., 1988 )-approach. GEE makes assumptions based on the first two moments rather than imposing explicit distributional assumptions. The idea of GEE is to estimate coefficients using a “working” correlation structure, which does not have to be identical to the unknown underlying true correlation. An incorrect correlation structure, while it would not bias the estimates, would affect the estimate of the variance. Thus, a correction approach is applied to obtain consistent estimates of variance and covariance. However, caution is merited, as GEE and GLMM might lead to different estimates and interpretations ( Fitzmaurice et al., 2012 ). Moreover, the correction procedure in GEE relies on aggregated information across subject-level data, but for cases of animal studies that only use a few animals in an experiment, the accuracy of GEE results may be questionable.

2.5. Bayesian analysis

In the LME and GLMM framework, the random-effects coefficients are drawn from a given distribution (typically Gaussian). Therefore, Bayesian analysis provides a natural alternative for analyzing the data considered in this Primer. One inherent advantage of Bayesian analysis is that it is easy to incorporate prior information on all the parameters in the model, including both the fixed-effects coefficients and the parameters involved in the variance-covariance matrices. In particular, the Bayesian framework allows practitioners to consider distributions of the random effects that are far from Gaussian, or to consider more flexible covariance structures needed to characterize the underlying data generating process. In the frequentist framework (see Glossary Box 1 and Appendix 1 of the Supplemental Materials ), computational algorithms can become formidably complex and prohibitive in those cases. The Bayesian framework obtains inference on the parameters of interest by means of the posterior distribution, which results from combining the prior information with the data using the Bayes’ theorem. Therefore, Bayesian inference does not rely on asymptotic approximations that may be invalid with limited sample sizes.

To describe how Bayesian analysis works for mixed-effects model, consider again the model (equation 4 ) in Section 2.3 :

For simplicity of presentation and to avoid advanced statistical and mathematical details required for more general models, we assume independently and identically distributed ( i.i.d. ) random effects, i.e., the random effects are i.i.d. from N (0, σ 2 u ). We also assume the errors are i.i.d. from N (0, σ 2 ). While we focus here for simplicity on the linear model (equation 4 ) from Section 2.3 , our discussion can also be extended to the generalized linear framework of Section 2.4 . Using the Bayes’ theorem, the posterior distribution, f ( β 0 , β 1 , …, β p , σ 2 u , σ 2 | Y ), is proportional to the product of the likelihood function f Y | β 0 , β 1 … , β p , σ u 2 , σ 2 and the prior distribution π β 0 , β 1 , … , β p , σ u 2 , σ 2 (summarizing the available knowledge on the parameters):

where f ( Y ) is a constant that depends only on the observed data but does not depend on the model parameters. If possible, the prior distribution π β 0 , β 1 , … , β p , σ u 2 , σ 2 should be chosen to reflect the beliefs or information investigators may have about the parameters. In the absence of prior knowledge about the parameters, uninformative prior distributions are often employed. These types of priors are also known as flat, weak, objective, vague, or diffuse priors. For example, a uniform distribution over a wide range or a normal distribution with a very large variance can be regarded as a weak prior for the fixed-effects coefficients.

Once the likelihood and the priors have been specified, Bayesian inference often requires the use of sophisticated sampling methods to get quantities from the posterior distribution, generally denoted as Markov chain Monte Carlo (MCMC) algorithms like the Gibbs sampling ( Gelfand and Smith, 1990 ), the Metropolis-Hastings algorithm ( Casella and George, 1992 ; Hastings, 1970 ; Metropolis et al., 1953 ), and the Hamiltonian Monte Carlo algorithm ( Betancourt, 2017 ; Duane et al., 1987 ; Hoffman and Gelman, 2014 ; Neal, 2011 ; Shahbaba et al., 2014 ). However, in practical applications, it is possible to employ existing software packages to conduct Bayesian analysis of mixed-effects models without the necessity of an in-depth knowledge of the underlying computational details ( Bürkner, 2017 ; Bürkner, 2018 ; Fong et al., 2010 ; Hadfield, 2010 ). Inference on a parameter can then be conducted using its marginal posterior distribution. For example, one can consider the mean of the posterior distribution as a point estimate of the unknown parameter as well as a 95% credible interval to obtain the Bayesian counterpart of a confidence interval in frequentist analysis. In a Bayesian framework, the 95% credible interval is an uncertainty estimate that identifies the shortest interval containing 95% of the posterior distribution of the parameter of interest (highest posterior density interval). Hypothesis testing on the parameters of the mixed-effects models can be conducted by comparing the marginal likelihoods under two competing models, via the so-called Bayes factor. The use of a Bayesian approach and Bayes factors has been sometimes advocated as an alternative to p-values since the Bayes factor represents a direct measure of the evidence of one model versus the other ( Benjamin and Berger, 2019 ; Held and Ott, 2018 ; Kass and Raftery, 1995 ).

3. Practical applications of the linear mixed-effects model (LME) and generalized linear mixed-effects model (GLMM)

We provide practical examples to demonstrate why conventional LM, including t-test and ANOVA fail for the analysis of correlated data, and why LME should be used instead, with its advantages in each practical example explained.

3.1. Example 1.

As described in Section 2.1 , we measured pCREB immunoreactivity of 1200 putative excitatory neurons in mouse visual cortex at different time points: collected at baseline (saline), 24, 48, 72 hours, and 1 week following ketamine treatment, collected from 24 mice ( Figure 2 ). If we use ANOVA or a linear model (LM) to compare each time point to the baseline (saline), as shown in Table 1 , we find that the p-values of all comparisons are less than 0.05 and the overall difference between the five groups is highly significant (p=1.2×10 −78 ). However, recall that the 1200 neurons are clustered in 24 mice. The ICC, design effect, and effective sample sizes ( Table 1 ) indicate that the dependency due to clustering is substantial. Therefore, the 1200 neurons should not be treated as 1200 independent cells. The lesson from this example is that the number of observational units is much larger than the number of experimental units (see reference ( Lazic et al., 2018 ) for helpful discussion). We used an LME with animal-specific random effects to handle the dependency due to clustering. The p-values are much larger than those from LM, thus less likely to reach the threshold of significance ( Table 2 ). Note that the difference between saline and 72h or 1wk by LME analysis is not significant after accounting for dependency of the data.

P-values for comparing pCREB immunoreactivity at each time point (24, 48, 72 hours, and 1 week) after ketamine treatment to the baseline (saline). The “Overall” column corresponds to the null hypothesis of no difference among the five groups (Example 1). The LME p-values are based upon the lme function in the nlme package, in which the denominator degrees of freedom are determined by the animal grouping level (Pinheiro and Bates, 2006). The methods for obtaining more accurate p-values with adjustments for multiple comparisons can be found in the Supplemental Materials .

Overall24h48h72h1wk
Linear Model (ANOVA)1.2×10 6.0×10 6.8×10 0.02911.1×10
LME0.00290.00490.01640.56010.2525

3.2. Example 2.

Data were derived from an experiment designed to determine how in vivo calcium (Ca ++ ) activity of PV cells (measured longitudinally) changes over time after ketamine treatment ( Grieco et al., 2020 ). Ca ++ event frequencies were measured from brain cells of four mice at 24h, 48h, 72h, and 1 week after ketamine treatment; Ca ++ event frequencies at 24h were compared to the other three time points. In total, Ca ++ event frequencies of 1724 neurons were measured. The boxplot in Figure 5A and LM (or ANOVA, t-test) analysis results in Table 3 indicate significantly reduced Ca ++ activity at 48h relative to 24h with p=4.8×10 −6 , and significantly increased Ca ++ event frequency at 1 week compared to 24h with p=2.4×10 -3 . However, if we account for repeated measures due to cells clustered in mice using LME with random intercepts (the model is similar to Equation ( 4 ) in Section 2.3.2 ), most of the p-values are greater than 0.05 and thus fail to reach significance except that the overall p-value is 0.04.

An external file that holds a picture, illustration, etc.
Object name is nihms-1756144-f0005.jpg

When data from different animals are naively pooled, the result can be dominated by the data from a single animal (Example 2). To illustrate this point, we present the boxplots of Ca++ event frequencies measured at four time points using two different ways: (A) Boxplot of Ca++ event frequencies using the pooled neurons from four mice. ANOVA or t-test showed that Ca++ activity was significantly reduced at 48h relative to 24h with p=4.8×10 −6 , and significantly increased Ca++ activity at 1wk compared to 24h with p=2.4×10 -3 . However, when looking at (B) boxplots of Ca++ event frequencies stratified by individual mice, these changes occur only in mouse 2. This is because Mouse 2 contributed 43% cells, which likely explains why the pooled data are more similar to Mouse 2 than to other mice. Note that the comparisons are not significant if we account for repeated measures due to cells clustered in mice using LME, thus avoiding an erroneous conclusion.

The results (estimates ± s.e., and p-values) for the Ca ++ event frequency data using LM and LME (Example 2).

48h72h1wk
LM (est)−0.078±0.0170.009±0.0170.050±0.016
LM (p)4.8×10 0.5952.4×10
LME (est)−0.011±0.0140.020±0.0140.025±0.014
LME (p)0.4240.1500.069

To understand the discrepancy between the results from LM and LME, we created boxplots for the pooled data and for each mouse ( Figure 5B ). Although the pooled data ( Figure 5A ) and the corresponding p-value from the LM show significant reduction in Ca ++ activities from 24h to 48h, we noticed that the only mouse showing a noticeable reduction was Mouse 2. In fact, close examination of Figure 5B suggests that there might be small increases in the other three mice. To examine why the pooled data follow the pattern of Mouse 2 and not that of other mice, we checked the number of neurons in each of the mouse-by-time combinations ( Table 4 ). The last column of Table 4 shows that Mouse 2 contributed 43% of all cells, which likely explains why the pooled data are more similar to Mouse 2 than to the other mice. The lesson from this example is that naively pooling data from different animals is a potentially dangerous practice, as the results can be dominated by a single animal that can misrepresent a substantial proportion of the measured data. Investigators limited to using LM often notice outlier data of a single animal and they may agonize about whether they are justified in “tossing that animal” from their analysis, sometimes by applying “overly creative post-hoc exclusion criteria”. The other way out of this thorny problem is the brute force approach of repeating the experiment with a much larger sample size – a more honest, but expensive solution. The application of LME solves this troubling potential problem as it takes dependency and weighting into account.

The number of neurons by mouse and time in Example 2. In total, Ca ++ event frequencies at 1,718 neurons were measured. When splitting the number by mouse, Mouse 2 has the largest number of measured neurons (43%). Thus, when pooling the cells naïvely, the overall results would be dominated by the results observed in Mouse 2.

24h48h72h1wkTotal
Mouse 1812548843466 (27%)
Mouse 2206101210222739 (43%)
Mouse 3331851207309 (18%)
Mouse 463525837210 (12%)
Total3834254075091,724 (100%)

In this example there are only four mice. This number may be smaller than the one recommended for using random-effects models. However, as discussed in ( Gelman and Hill, 2006 ), using a random-effects model in this situation will not provide much gain versus simpler analyses, but probably will not do much harm either. An alternative would be to include the animal ID variable as a factor with fixed animal effects in the conventional linear regression. However, a recent study suggests that clusters should be modeled using random effects as long as the software does not incur any computational issue such as flags due to convergence (Oberpriller et al., 2021). Note that neither of the two analyses is the same as fitting a linear model to the pooled cells together, which erroneously ignores the between-animal heterogeneity and fails to account for the data dependency due to the within-animal similarity. In a more extreme case, for an experiment using only two monkeys for example, naively pooling the neurons from the two animals faces the risk of making conclusions mainly from one animal and unrealistic homogeneous assumptions across animals, as discussed above. A more appropriate approach is to analyze the animals separately and check whether the results from these two animals “replicate” each other. Exploratory analysis such as data visualization is highly recommended to identify potential issues.

3.3. Example 3.

In this experiment, Ca ++ event integrated amplitudes are compared between baseline (saline) and 24h after ketamine treatment ( Grieco et al., 2020 ). 1248 cells were sampled from 11 mice and each cell was measured twice (baseline and after ketamine treatment). As a result, correlation arises from both cells and animals, which creates a three-level structure: repeated measurements (baseline and after treatment) within cells, and cells within animals. It is clear that the ketamine treatment should be included as a fixed effect. The choice of the random effects deserves more careful consideration. The hierarchical structure, i.e., two observations per cell and multiple cells per animal, suggests that the random effects of the cells should be nested within individual mice. We first consider a basic model that includes random intercepts at both cell and animal levels:

where the indices i , j , k stand for the i th mouse, the j th cell, and the k th measurement of neuron j from mouse i . Similarly, x ijk = 1 if the measurement is taken after treatment and 0 if it is taken at baseline. By including the cell variable in the random effect, we implicitly capture the change from “before” to “after” treatment for each cell. This is similar to how paired data are handled in a paired t-test. Moreover, by specifying that the cells are nested within individual mice, we essentially model the correlations within both mouse and cell levels. As explained in the Supplemental Materials (Part II Example 3), when the cell IDs are not unique, specifying nested random effects is necessary; otherwise two cells with the same cell ID from two different mice will be considered as sharing a cell-specific effect (known as crossed random effects, in comparison to nested random effects), which does not make sense. We recommend that users employ unique cell IDs across animals to avoid confusion and mistakes in the model specification.

For the treatment effect, LME and LM produce similar estimates; however, the standard error of the LM was larger. Thus, the p-value based on LME was smaller (0.0036 for LM vs 0.0001 for LME). In this example, since the two measures from each cell are positively correlated ( Figure 6 ), the variance of the differences is smaller when treating the data as paired than as independent. As a result, the more rigorous practice of using cell effects as random effects leads to lower but more accurate p-values. The lesson in this example is that the LME can actually yield lower p-values than conventional approaches. This is opposite to Example 1 and Example 2 and dispels the potential notion that LME incurs a “cost” by always leading to greater p-values. Rigorous statistical analysis is not a hunt for the smallest p-value (commonly known as p-hacking or significance chasing); the objective of the experimenter should be always to use the most appropriate and thorough analysis method.

An external file that holds a picture, illustration, etc.
Object name is nihms-1756144-f0006.jpg

(A) the scatter plot of Ca++ event integrated amplitude at baseline vs 24h after treatment for the neurons from four example mice (labeled as 1, 2, 3 and 4) indicates that the baseline and after-treatment measures are positively correlated. (B) boxplot of the baseline and after-treatment correlations of all the 11 mice. Due to the positive correlations shown in the data, the variance of differences is smaller when treating the data as paired than independent. As a result, LME produced a smaller p-value than t-test.

In this example, the random effects involve more than one level and the LME model we fit includes neuron-specific and animal-specific random intercepts. Sometimes, models incorporating additional random effects might be appropriate to account for additional sources of variability ( Barr et al., 2013 ; Ferron et al., 2002 ; Heisig and Schaeffer, 2019 ; Kwok et al., 2007 ; Matuschek et al., 2017 ). For example, both the overall mean levels and the treatment effects may vary across animals and neurons. A mouse may have a higher (lower) treatment response than the average population response, e.g., due to unobserved individual physiology. The plausibility of including extra random effects can often be assessed visually by linearly interpolating the observed response over the values of the predictor of interest in each cluster (e.g., all the recorded Ca ++ event integrated amplitudes pre- and post-treatment within a specific animal); that is, by conducting an LM regression within each cluster. Suppose the interpolation suggests that the slopes of the regression differ across clusters/animals along with their intercepts. In that case, the LME may incorporate both random intercepts and random slopes to capture how each mouse responds differently to the treatment. It might also be helpful to allow correlations between the different random-effect components. In the example considered here, there is a nested structure of clusters: cells within animals. Therefore, it is possible to conceive three other models with additional random effects: a model that includes random slopes only at the neuron level, a model with random slopes only at the animal level, and a model with random slopes for both neurons and animals. By conducting likelihood ratio tests to compare these models, we find that including random slopes at the neuron level leads to substantial improvement in the likelihood. On the other hand, random slopes at the animal level seem unnecessary. More detailed analyses and technical remarks are provided in our accompanying Supplemental Materials . It should be noted that the modeling decisions should not be based on tests and p-values alone, as the result might be significant even with a very small effect size if the sample size is large enough or be insignificant with a moderate or large effect size for small sample sizes. Rather, the modeling decision should always be guided by the combined information provided by the study design, scientific reasoning, and previous evidence. For example, different animals are expected to have different mean levels on outcome variables; thus, it is reasonable to model the variation due to animals by considering animal-specific random effects. A similar argument is the inclusion of baseline covariates such as age in many biomedical studied even when they are not significant. Also, when random slopes are included, it is typically recommended to include the corresponding random intercepts. If random slopes (for treatment) are included at the animal level, it is sensible to also include the animal-specific random intercepts.

3.4. Example 4.

In this example, we will illustrate how to use both frequentist and Bayesian GLMM approaches to analyze binary outcomes. The data set analyzed here is simulated based on a published study ( Wei et al., 2020 ), in which eight mice were trained in a tactile delayed response task to use their whiskers to predict the location (left or right) of a water pole and report it with directional licking (lick left or lick right). The behavioral outcome we are interested in is whether the animals made the correct predictions. Therefore, we code correct left or right licks as 1 and erroneous licks as 0. In total, 512 trials were generated in our simulation, which includes 216 correct trials and 296 wrong trials. One question we would like to answer is whether a particular neuron is associated with the prediction. For this purpose, we analyze the prediction outcome and mean neural activity levels (measured by neuronal calcium signal changes, dF/F) from the 512 trials using a GLMM. The importance of modeling correlated data by introducing random effects has been shown in the previous examples. In this example, we focus on how to interpret results from a GLMM model for the mouse behavioral and imaging experiment.

The result from a frequentist approach shows that with the increase of one percent of mean calcium intensity (dF/F), the odds that the mice will make a correct prediction will increase by 6.4% (95% confidence interval: 2.4%−10.7%) and the corresponding p-value is 0.0016 based on the large-sample Wald test. The large-sample likelihood ratio test and a parametric bootstrap test give similar p-values.

The Bayesian analysis requires the specification of the prior distributions for the model parameters. Due to the lack of prior information, we select priors that are relatively non-informative, i.e., those have large variances around their means. More specifically, we use a normal prior with mean 0 and large standard deviation 10 for the fixed-effect coefficients. For the variances of the random intercept and the errors, we imposed a half-Cauchy distribution with a scale parameter of 5. The results showed that the odds that the mice will make a correct prediction increase by 6.2% (95% credible interval: 2.0%−10.6%) with 1% increase in dF/F. The Bayes factor of the model with dF/F versus the null model is 5.02, i.e., the posterior odds of the model with dF/F to the null model is five times of the prior odds, suggesting moderate association of dF/F with prediction ( Held and Ott, 2018 ; Kass and Raftery, 1995 ). These results are comparable to those from the frequentist GLMM in the preceding paragraph.

4. Resources

We provide effective and easy-to-follow instructions for the implementation of LME and GLMM with access to the R code, with practice data sets to help with such analysis and results interpretation in the Supplemental Materials . We choose R because it is a free and open source software (CRAN) ( R Development Core Team, 2020 ), widely adopted by the data science community. One major advantage of R over other open source or commercial software is that R has a rich collection of user-contributed packages (over 15,000), greatly facilitating a programing environment for developers and the access to cutting-edge statistical methods. There are many statistical packages. A selected (but not complete) list of packages that provide statistical inference and tools for mixed-effects models is summarized in Table 5 . Our sample code, explanations and interpretations of results from lme4 ( Bates et al., 2014 ), nlme ( Pinheiro et al., 2007 ), icc ( Wolak and Wolak, 2015b ), pbkrtest ( Halekoh and Højsgaard, 2014 ), brms ( Bürkner, 2017 ; Bürkner, 2018 ), lmerTest ( Kuznetsova et al., 2017 ), emmeans ( Lenth et al., 2019 ), car ( Fox and Weisberg, 2018 ), and sjPlot ( Lüdecke, 2018 ) are provided in the Supplemental Materials .

Selected R packages and functions for mixed-effects modeling and statistical inference.

Package nameFunctions related to mixed-effect modeling
: fit a linear mixed-effects model
: fit a linear mixed-effects model
: fit a generalized linear mixed-effects model
It can conduct Bayesian mixed-effects modeling.
It can perform hypothesis testing on fixed and random effects based on models from lme4::lmer.
It can provide adjusted p-values for pairwise and treatments versus control comparisons.
It can perform the F test (Kenward-Roger and Satterthwaite-type), and parametric bootstrap test.
car::Anova provides large-ample Wald test or F test with Kenward-Roger denominator degrees of freedom.
It can provide visualization and create manuscript-style tables.

5. Discussion and Conclusions

Our goal is to raise awareness of the widespread issue in correlated data analysis by t-test and ANOVA and to introduce effective solutions and provide clear guidance on how to analyze data that are clustered or have repeated measurements. We note that the issues raised in our article should be considered ideally in the first steps of experimental design, rather than as post-hoc applications. Prior knowledge based on direct experience, information from published literature, or pilot studies on possible ranges of ICC are useful for optimizing statistical power with fixed available resources. For repeated measurements involving a single level of clusters, formulas to obtain the optimal number of clusters (such as animals) and the number of observations per cluster (such as cells) can be determined ( Aarts et al., 2014 ). For more complicated scenarios, simulation-based methods seem to be more suitable for accurate power analysis and sample size calculations ( Green and MacLeod, 2016 ).

One might be tempted to use summary statistics such as cluster means to remove correlations due to animal effects. These approaches are not applicable to all experimental designs, such as those involving crossed random effects ( Baayen et al., 2008 ). When methods based on summary statistics work, they give correct type I error rates, but they often have lower power than LME ( Aarts et al., 2014 ; Galbraith et al., 2010 ). Compared to LME, the paired t-test and repeated ANOVA are far more familiar to most researchers. For simple designs such as paired samples or balanced designs, they are still valuable tools; however, they can be less efficient in the presence of missing data. For example, repeated ANOVA implements list-wise deletion, i.e., the entire list or case will be deleted if one single measure is missing. Since an incomplete case still provides information about the parameters we are interested in, deleting the entire case does not make full use of data. As a comparison, by using a likelihood approach, LME is still able to capture information provided by incomplete cases.

As generalizations of linear models, mixed-effects models (LME and GLMM) also share many of the same challenges: model selection and diagnostics, heterogeneous variances, and adjustments for multiple comparisons. What if the outcome data are severely skewed? How will one jointly analyze multiple features? Statisticians have developed methods to address these challenges. For example, resampling methods have been proposed as robust alternatives to LME ( Halekoh and Højsgaard, 2014 ; Zeger et al., 1988 ). To relax the Gaussian assumption of random errors, statisticians have proposed semiparametric methods where treatment effects remain parametric and the distributions of random effects are estimated using nonparametric methods ( Datta and Satten, 2005 ; Dutta and Datta, 2016 ; Rosner et al., 2006 ; Rosner and Grove, 1999 ). In addition, it is important to conduct model diagnostics on the random effects when conducting LME. Due to the limited space, it is overambitious to cover all the practical issues one may encounter in handling dependent data, including the issue of multiple testing and the misuse and misinterpretation of p-values. We refer the interested reader to specialized research articles (Aickin and Gensler, 1996; Altman and Bland, 1995; Benjamin and Berger, 2019 ; Benjamini and Hochberg, 1995; Gelman and Stern, 2006; Goodman, 2008; Holm, 1979; McHugh, 2011; Storey, 2002; Wasserstein and Lazar, 2016) or consult with experienced statisticians.

We believe that proper use of linear and generalized mixed-effects models will help neuroscience researchers to improve their experimental design and leverage the advantages of more recently developed statistical methodologies. The recommended statistical approach introduced in this article will lead to data analyses with greater validity, and will enable accurate and informative interpretation of results toward higher reproducibility of experimental findings in the neurosciences.

Supplementary Material

Acknowledgements:.

This work was supported by US National Institutes of Health (NIH) grants (R01EY028212, R01MH105427 and R01NS104897). TCH was supported by the NIH grant R35GM127102. MG was supported by NSF grant SES 1659921.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of Interests

The authors declare no competing interests.

In this Primer article, Yu et al. introduce linear and generalized mixed-effects models for improved statistical analysis in neuroscience research, and provide clear instruction on how to recognize when they are needed and how to apply them.

Data availability statement

  • Aarts E, Verhage M, Veenvliet JV, Dolan CV, and van der Sluis S (2014). A solution to dependency: using multilevel analysis to accommodate nested data. Nature Neuroscience 17 , 491–496. [ PubMed ] [ Google Scholar ]
  • Alberts B, Kirschner MW, Tilghman S, and Varmus H (2014). Rescuing US biomedical research from its systemic flaws. Proceedings of the National Academy of Sciences of the United States of America 111 , 5773–5777. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Baayen RH, Davidson DJ, and Bates DM (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59 , 390–412. [ Google Scholar ]
  • Barr DJ, Levy R, Scheepers C, and Tily HJ (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language 68 , 255–278. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bates D, Mächler M, Bolker B, and Walker S (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:14065823
  • Benjamin DJ, and Berger JO (2019). Three recommendations for improving the use of p-values. The American Statistician 73 , 186–191. [ Google Scholar ]
  • Betancourt M (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:170102434
  • Boisgontier MP, and Cheval B (2016). The anova to mixed model transition. Neuroscience and Biobehavioral Reviews 68 , 1004–1005. [ PubMed ] [ Google Scholar ]
  • Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, and White J-SS (2009). Generalized linear mixed models: a practical guide for ecology and evolution. Trends in ecology & evolution 24 , 127–135. [ PubMed ] [ Google Scholar ]
  • Breslow NE, and Clayton DG (1993). Approximate inference in generalized linear mixed models. Journal of the American statistical Association 88 , 9–25. [ Google Scholar ]
  • Bürkner P-C (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of statistical software 80 , 1–28. [ Google Scholar ]
  • Bürkner P (2018). Advanced Bayesian Multilevel Modeling with the R Package brms. The R Journal , 10 ( 1 ), 395. [ Google Scholar ]
  • Casella G, and George EI (1992). Explaining the Gibbs sampler. The American Statistician 46 , 167–174. [ Google Scholar ]
  • Datta S, and Satten GA (2005). Rank-sum tests for clustered data. Journal of the American Statistical Association 100 , 908–915. [ Google Scholar ]
  • Duane S, Kennedy AD, Pendleton BJ, and Roweth D (1987). Hybrid monte carlo. Physics letters B 195 , 216–222. [ Google Scholar ]
  • Dutta S, and Datta S (2016). A rank‐sum test for clustered data when the number of subjects in a group within a cluster is informative. Biometrics 72 , 432–440. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ferron J, Dailey R, and Yi Q (2002). Effects of misspecifying the first-level error structure in two-level models of change. Multivariate Behavioral Research 37 , 379–403. [ PubMed ] [ Google Scholar ]
  • Fiedler K (2011). Voodoo Correlations Are Everywhere-Not Only in Neuroscience. Perspectives on Psychological Science 6 , 163–171. [ PubMed ] [ Google Scholar ]
  • Fischer R (1944). Statistical methods for research workers, 1925. Edinburgh Oliver Boyd 518 . [ Google Scholar ]
  • Fisher RA (1919). XV.—The correlation between relatives on the supposition of Mendelian inheritance. Earth and Environmental Science Transactions of the Royal Society of Edinburgh 52 . [ Google Scholar ]
  • Fitzmaurice GM, Laird NM, and Ware JH (2012). Applied longitudinal analysis , Vol 998 (John Wiley & Sons; ). [ Google Scholar ]
  • Fong Y, Rue H, and Wakefield J (2010). Bayesian inference for generalized linear mixed models. Biostatistics 11 , 397–412. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fox J, and Weisberg S (2018). An R companion to applied regression (Sage publications; ). [ Google Scholar ]
  • Freedman LP, Cockburn IM, and Simcoe TS (2015). The Economics of Reproducibility in Preclinical Research. Plos Biology 13 , e1002165. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Galbraith S, Daniel JA, and Vissel B (2010). A study of clustered data and approaches to its analysis. Journal of Neuroscience 30 , 10601–10608. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gelfand AE, and Smith AF (1990). Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 85 , 398–409. [ Google Scholar ]
  • Gelman A (2005). Analysis of variance—why it is more important than ever. Annals of statistics 33 , 1–53. [ Google Scholar ]
  • Gelman A, and Hill J (2006). Data analysis using regression and multilevel/hierarchical models (Cambridge university press; ). [ Google Scholar ]
  • Green P, and MacLeod CJ (2016). SIMR: an R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution 7 , 493–498. [ Google Scholar ]
  • Grieco SF, Qiao X, Zheng XT, Liu YJ, Chen LJ, Zhang H, Yu ZX, Gavornik JP, Lai CR, Gandhi SP, et al. (2020). Subanesthetic Ketamine Reactivates Adult Cortical Plasticity to Restore Vision from Amblyopia. Current Biology 30 , 3591-+. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hadfield JD (2010). MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of statistical software 33 , 1–22. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Halekoh U, and Højsgaard S (2014). A kenward-roger approximation and parametric bootstrap methods for tests in linear mixed models–the R package pbkrtest. Journal of Statistical Software 59 , 1–30. [ Google Scholar ]
  • Hastings WK (1970). Monte-Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57 , 97–109. [ Google Scholar ]
  • Heisig JP, and Schaeffer M (2019). Why you should always include a random slope for the lower-level variable involved in a cross-level interaction. European Sociological Review 35 , 258–279. [ Google Scholar ]
  • Held L, and Ott M (2018). On p-Values and Bayes Factors. Annual Review of Statistics and Its Application , Vol 5 5, 393–419. [ Google Scholar ]
  • Henderson CR (1949). Estimation of changes in herd environment. J Dairy Sci 32 , 706–706. [ Google Scholar ]
  • Henderson CR, Kempthorne O, Searle SR, and Von Krosigk C (1959). The estimation of environmental and genetic trends from records subject to culling. Biometrics 15 , 192–218. [ Google Scholar ]
  • Hoffman MD, and Gelman A (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15 , 1593–1623. [ Google Scholar ]
  • Jiang J, and Nguyen T (2021). Linear and generalized linear mixed models and their applications , 2 edn (Springer; ). [ Google Scholar ]
  • Kass RE, and Raftery AE (1995). Bayes factors. Journal of the american statistical association 90 , 773–795. [ Google Scholar ]
  • Kilkenny C, Parsons N, Kadyszewski E, Festing MF, Cuthill IC, Fry D, Hutton J, and Altman DG (2009). Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PloS one 4 , e7824. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kish L (1965). Survey sampling (Wiley; ). [ Google Scholar ]
  • Kuznetsova A, Brockhoff PB, and Christensen RHB (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software 82 , 1–26. [ Google Scholar ]
  • Kwok O. m., West SG, and Green SB (2007). The impact of misspecifying the within-subject covariance structure in multiwave longitudinal multilevel models: A Monte Carlo study. Multivariate Behavioral Research 42 , 557–592. [ Google Scholar ]
  • Laird NM, and Ware JH (1982). Random-effects models for longitudinal data. Biometrics 38 , 963–974. [ PubMed ] [ Google Scholar ]
  • Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R, Bradley EW, Crystal RG, Darnell RB, Ferrante RJ, and Fillit H (2012). A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490 , 187–191. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lazic SE, Clarke-Williams CJ, and Munafo MR (2018). What exactly is ‘N’ in cell culture and animal experiments? Plos Biology 16 , e2005282. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lenth R, Singmann H, Love J, Buerkner P, and Herve M (2019). Estimated marginal means, aka least-squares means. R package version 1.3. 2
  • Liang K-Y, and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 , 13–22. [ Google Scholar ]
  • Lüdecke D (2018). sjPlot: Data visualization for statistics in social science. R package version 2 . [ Google Scholar ]
  • Macleod MR, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis JPA, Salman RA, Chan AW, and Glasziou P (2014). Biomedical research: increasing value, reducing waste. Lancet 383 , 101–104. [ PubMed ] [ Google Scholar ]
  • Margolis R, Derr L, Dunn M, Huerta M, Larkin J, Sheehan J, Guyer M, and Green ED (2014). The National Institutes of Health’s Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data. Journal of the American Medical Informatics Association 21 , 957–958. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Matuschek H, Kliegl R, Vasishth S, Baayen H, and Bates D (2017). Balancing Type I error and power in linear mixed models. Journal of memory and language 94 , 305–315. [ Google Scholar ]
  • McCullagh P, and Nelder JA (2019). Generalized linear models (Routledge; ). [ Google Scholar ]
  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, and Teller E (1953). Equation of state calculations by fast computing machines. The journal of chemical physics 21 , 1087–1092. [ Google Scholar ]
  • Neal RM (2011). MCMC using Hamiltonian dynamics. In Handbook of markov chain monte carlo (Chapman and Hall/CRC; ), pp. 113–162. [ Google Scholar ]
  • Nelder JA, and Wedderburn RW (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General) 135 , 370–384. [ Google Scholar ]
  • Pinheiro J, Bates D, DebRoy S, Sarkar D, and Team RC (2007). Linear and nonlinear mixed effects models. R package version 3 , 1–89. [ Google Scholar ]
  • Prinz F, Schlange T, and Asadullah K (2011). Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews Drug discovery 10 , 712-712. [ PubMed ] [ Google Scholar ]
  • R Development Core Team (2020). R: A language and environment for statistical computing (Vienna, Austria: R Foundation for Statistical Computing; ). [ Google Scholar ]
  • Rosner B, Glynn RJ, and Lee MLT (2006). Extension of the rank sum test for clustered data: Two‐group comparisons with group membership defined at the subunit level. Biometrics 62 , 1251–1259. [ PubMed ] [ Google Scholar ]
  • Rosner B, and Grove D (1999). Use of the Mann–Whitney U‐test for clustered data. Statistics in medicine 18 , 1387–1400. [ PubMed ] [ Google Scholar ]
  • Shahbaba B, Lan S, Johnson WO, and Neal RM (2014). Split hamiltonian monte carlo. Statistics and Computing 24 , 339–349. [ Google Scholar ]
  • Steward O, and Balice-Gordon R (2014). Rigor or mortis: best practices for preclinical research in neuroscience. Neuron 84 , 572–581. [ PubMed ] [ Google Scholar ]
  • Stiratelli R, Laird N, and Ware JH (1984). Random-effects models for serial observations with binary response. Biometrics , 961–971. [ PubMed ]
  • Student (1908). The probable error of a mean. Biometrika , 1–25.
  • Wedderburn RW (1974). Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika 61 , 439–447. [ Google Scholar ]
  • Wei Z, Lin B-J, Chen T-W, Daie K, Svoboda K, and Druckmann S (2020). A comparison of neuronal population dynamics measured with calcium imaging and electrophysiology. PLoS computational biology 16 , e1008198. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wilson MD, Sethi S, Lein PJ, and Keil KP (2017). Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models. Journal of neuroscience methods 279 , 33–43. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wolak M, and Wolak M (2015a). R Package “ICC.” Facilitating estimation of the intraclass correlation coefficient. R Documentation
  • Wolak M, and Wolak MM (2015b). Package ‘ICC’. Facilitating estimation of the Intraclass Correlation Coefficient
  • Wolfinger R, and O’connell M (1993). Generalized linear mixed models a pseudo-likelihood approach. Journal of statistical Computation and Simulation 48 , 233–243. [ Google Scholar ]
  • Zeger SL, and Karim MR (1991). Generalized linear models with random effects; a Gibbs sampling approach. Journal of the American statistical association 86 , 79–86. [ Google Scholar ]
  • Zeger SL, and Liang K-Y (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics , 121–130. [ PubMed ]
  • Zeger SL, Liang K-Y, and Albert PS (1988). Models for longitudinal data: a generalized estimating equation approach. Biometrics , 1049–1060. [ PubMed ]

Teach yourself statistics

Experimental Design for Repeated Measures

This lesson begins our discussion of repeated measures designs. The purpose of this lesson is to provide background knowledge that can help you decide whether a repeated measures design is the right design for your study. Specifically, we will answer four questions:

  • What is a repeated measures design?
  • How does an experimenter implement a repeated measures experiment?
  • What are the data requirements for analysis of variance with a repeated measures experiment?
  • What are advantages and disadvantages of a repeated measures experiment?

What about data analysis? We will explain how to analyze data from a repeated measures experiment in the following future lessons:

  • One-Factor Repeated Measures: Example .
  • Repeated Measures ANOVA With Excel .

Prerequisites: The lesson assumes familiarity with randomized block designs. If you are unfamiliar with terms like blocks , blocking , and blocking variables , review the following previous lesson: Randomized Block Designs .

What is a Repeated Measures Design?

A repeated measures design is a type of randomized block design. It is a randomized block design in which each experimental unit serves as a blocking variable.

Consider a single-factor experiment - one independent variable and one dependent variable. If the independent variable has k treatment levels, a repeated measures design requires k observations on each experimental unit. Because multiple measurements are obtained from each experimental unit, this type of design is called a repeated measures design or a within subjects design.

How to Implement a Repeated Measures Experiment

A repeated measures experiment is distinguished by the following attributes:

  • The design has one or more factors (i.e., one or more independent variables ), each with two or more levels .
  • Treatment groups are defined by a unique combination of non-overlapping factor levels.
  • Experimental units are randomly selected from a known population .
  • Each experimental unit is measured on each level of at least one independent variable.

The table below shows the layout for a typical repeated measures experiment with one independent variable.

  T T T T
S X X X X
S X X X X
S X X X X
S X X X X
S X X X X

In this experiment, there are five subjects ( S i  ) and one independent variable with four treatment levels ( T j  ). Dependent variable scores are represented by X  i, j  , where X  i, j is the score for subject i under treatment j .

Consider the sample size requirements for this repeated measures design, compared to an independent groups design .

  Repeated measures Independent groups
Sample size 5 20
Scores 20 20

This repeated measures design uses five subjects to produce 20 dependent variable scores. To produce 20 dependent variable scores with an independent groups design, the experiment would require 20 subjects; because an independent groups experiment collects only one dependent variable score from each subject.

Data Requirements for Repeated Measures

The data requirements for analysis of variance are similar to the requirements for the independent groups designs that we've covered previously, (e.g., see One-Way Analysis of Variance and ANOVA With Full Factorial Experiments ). Like an independent groups design, a repeated measures design requires that the dependent variable be measured on an interval scale or a ratio scale . And, like an independent groups design, a repeated measures design makes three assumptions about dependent variable scores:

  • Independence . The dependent variable score for each experimental unit is independent of the score for any other unit.
  • Normality . In the population, dependent variable scores are normally distributed within treatment groups.
  • Equality of variance . In the population, the variance of dependent variable scores in each treatment group is equal. (Equality of variance is also known as homogeneity of variance or homoscedasticity.)

In addition to the requirements listed above, a repeated measures design requires one additional assumption that is not required by an independent groups design. That assumption is sphericity .

Sphericity exists when the variance of the difference between scores for any two levels of a repeated measures variable is constant. Lack of sphericity is a potential problem for repeated measures designs when a repeated measures treatment variable has more than two levels. If a repeated measures treatment variable has only two levels, you don't have to worry about sphericity.

If a repeated measures treatment variable has three or more levels, the sphericity assumption should be satisfied for any main effect or interaction effect based on the treatment variable. If the sphericity assumption is violated, your hypothesis test will be positively biased; that is, you will be more likely to make a Type I error (i.e., reject the null hypothesis when it is, in fact, true).

So, how do you deal with potential violations of sphericity? Luckily, it is possible to estimate the degree to which the sphericity assumption is violated in your data and use that estimate to make a correction in the analysis. Many software packages (e.g., SAS, SPSS) will do this for you; so if your analytical software includes an option to adjust for sphericity, use that option.

If you don't have access to software that can deal with sphericity, you may have to make a sphericity adjustment yourself. We will show you how to do this in a future lesson: see Sphericity Lesson .

Advantages and Disadvantages

Compared to an independent groups experiment, a repeated measures experiment has advantages and disadvantages. Advantages include the following:

A repeated measures experiment is almost always more powerful than an independent groups experiment of comparable size.

  • Because a repeated measures experiment requires fewer experimental units than a comparable independent groups experiment, it may be cheaper, quicker, or easier to implement.

Disadvantages include the following:

  • The repeated measures experiment makes a sphericity assumption that is not required by an independent groups experiment.
  • Results from a repeated measures experiment may be affected by order effects (e.g., learning, fatigue) that do not exist with an independent groups experiment.
  • To control for order effects, researchers must vary the order in which treatment levels are administered (e.g., randomizing or reversing the order of treatments among experimental units).

Test Your Understanding

Which of the following statements is true for a repeated measures design?

(A) Each subject provides a single, dependent variable score. (B) Each subject provides two or more scores on the dependent variable. (C) A repeated measures design is a type of independent groups design. (D) None of the above. (E) All of the above.

The correct answer is (B).

In a repeated measures experiment, each subject provides two or more dependent variable scores; so option A is incorrect. And a repeated measures design is a type of randomized blocks design, not a type of independent groups design; so option C is incorrect.

Why would an experimenter choose to use a repeated measures design?

(A) To avoid potential problems caused by a violation of the sphericity assumption. (B) To avoid potential order effects (e.g., fatigue, learning). (C) To minimize sample size requirements. (D) None of the above. (E) All of the above.

The correct answer is (C).

A violation of the sphericity assumption is a problem for a repeated measures design, but not for an independent groups design. So using a repeated measures design would not help an experimenter avoid problems associated with violations of sphericity. Similarly, a repeated measures design is vulnerable to potential order effects. So a repeated measures design would not help an experimenter avoid order effects. Instead, the experimenter who uses a repeated measures design has to implement additional steps (e.g., counterbalancing, randomizing treatment order) to control for order effects.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 09 September 2024

Rejuvenation of aged oocyte through exposure to young follicular microenvironment

  • HaiYang Wang   ORCID: orcid.org/0000-0002-6362-6022 1 ,
  • Zhongwei Huang 2 , 3 , 4 ,
  • Xingyu Shen 1 ,
  • Yaelim Lee 1 ,
  • XinJie Song   ORCID: orcid.org/0009-0001-4833-9957 1 ,
  • Chang Shu   ORCID: orcid.org/0009-0000-1096-6771 1 ,
  • Lik Hang Wu   ORCID: orcid.org/0000-0002-9642-2552 5 , 6 , 7 ,
  • Leroy Sivappiragasam Pakkiri 5 , 6 ,
  • Poh Leong Lim 5 , 6 ,
  • Xi Zhang 8 ,
  • Chester Lee Drum   ORCID: orcid.org/0000-0001-6327-4584 5 , 6 ,
  • Jin Zhu 1 &
  • Rong Li   ORCID: orcid.org/0000-0002-0540-6566 1 , 8 , 9  

Nature Aging ( 2024 ) Cite this article

32 Altmetric

Metrics details

  • Germline development
  • Reproductive disorders

Reproductive aging is a major cause of fertility decline, attributed to decreased oocyte quantity and developmental potential. A possible cause is aging of the surrounding follicular somatic cells that support oocyte growth and development by providing nutrients and regulatory factors. Here, by creating chimeric follicles, whereby an oocyte from one follicle was transplanted into and cultured within another follicle whose native oocyte was removed, we show that young oocytes cultured in aged follicles exhibited impeded meiotic maturation and developmental potential, whereas aged oocytes cultured within young follicles were significantly improved in rates of maturation, blastocyst formation and live birth after in vitro fertilization and embryo implantation. This rejuvenation of aged oocytes was associated with enhanced interaction with somatic cells, transcriptomic and metabolomic remodeling, improved mitochondrial function and higher fidelity of meiotic chromosome segregation. These findings provide the basis for a future follicular somatic cell-based therapy to treat female infertility.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

$119.00 per year

only $9.92 per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

experimental design anova

Similar content being viewed by others

experimental design anova

Primary oocytes with cellular senescence features are involved in ovarian aging in mice

experimental design anova

Quercetin promotes in vitro maturation of oocytes from humans and aged mice

experimental design anova

Inhibitory effects of astaxanthin on postovulatory porcine oocyte aging in vitro

Data availability.

All raw RNA-seq data, as well as processed datasets, can be found in the Gene Expression Omnibus database under accession number GSE270016 . Metabolomics data are available in Supplementary Table 5 . The rest of the data generated or analyzed during this study are all included in the published article and its Supplementary Information files. Source data are provided with this paper. All other data are available from the corresponding authors upon reasonable request.

Gruhn, J. R. et al. Chromosome errors in human eggs shape natural fertility over reproductive life span. Science 365 , 1466–1469 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sang, Q., Ray, P. F. & Wang, L. Understanding the genetics of human infertility. Science 380 , 158–163 (2023).

Article   CAS   PubMed   Google Scholar  

Navot, D. et al. Poor oocyte quality rather than implantation failure as a cause of age-related decline in female fertility. Lancet 337 , 1375–1377 (1991).

Cimadomo, D. et al. Impact of maternal age on oocyte and embryo competence. Front. Endocrinol. 9 , 327 (2018).

Article   Google Scholar  

Zhang, H. et al. Life-long in vivo cell-lineage tracing shows that no oogenesis originates from putative germline stem cells in adult mice. Proc. Natl Acad. Sci. USA 111 , 17983–17988 (2014).

Faddy, M. J., Gosden, R. G., Gougeon, A., Richardson, S. J. & Nelson, J. F. Accelerated disappearance of ovarian follicles in mid-life: implications for forecasting menopause. Hum. Reprod. 7 , 1342–1346 (1992).

Titus, S. et al. Impairment of BRCA1 -related DNA double-strand break repair leads to ovarian aging in mice and humans. Sci. Transl. Med. 5 , 172ra121 (2013).

Kawamura, K. et al. Hippo signaling disruption and Akt stimulation of ovarian follicles for infertility treatment. Proc. Natl Acad. Sci. USA 110 , 17474–17479 (2013).

Kawamura, K., Kawamura, N. & Hsueh, A. J. Activation of dormant follicles: a new treatment for premature ovarian failure? Curr. Opin. Obstet. Gynecol. 28 , 217–222 (2016).

Article   PubMed   PubMed Central   Google Scholar  

Seckin, S., Ramadan, H., Mouanness, M., Kohansieh, M. & Merhi, Z. Ovarian response to intraovarian platelet-rich plasma (PRP) administration: hypotheses and potential mechanisms of action. J. Assist. Reprod. Genet. 39 , 37–61 (2022).

Walker, Z., Lanes, A. & Ginsburg, E. Oocyte cryopreservation review: outcomes of medical oocyte cryopreservation and planned oocyte cryopreservation. Reprod. Biol. Endocrinol. 20 , 10 (2022).

Khattak, H. et al. Fresh and cryopreserved ovarian tissue transplantation for preserving reproductive and endocrine function: a systematic review and individual patient data meta-analysis. Hum. Reprod. Update 28 , 400–416 (2022).

Reinhardt, K., Dowling, D. K. & Morrow, E. H. Medicine. Mitochondrial replacement, evolution, and the clinic. Science 341 , 1345–1346 (2013).

Article   PubMed   Google Scholar  

Labarta, E., de Los Santos, M. J., Escriba, M. J., Pellicer, A. & Herraiz, S. Mitochondria as a tool for oocyte rejuvenation. Fertil. Steril. 111 , 219–226 (2019).

Matzuk, M. M., Burns, K. H., Viveiros, M. M. & Eppig, J. J. Intercellular communication in the mammalian ovary: oocytes carry the conversation. Science 296 , 2178–2180 (2002).

Li, R. & Albertini, D. F. The road to maturation: somatic cell interaction and self-organization of the mammalian oocyte. Nat. Rev. Mol. Cell Biol. 14 , 141–152 (2013).

Park, J. Y. et al. EGF-like growth factors as mediators of LH action in the ovulatory follicle. Science 303 , 682–684 (2004).

Rodgers, R. J. & Irving-Rodgers, H. F. Formation of the ovarian follicular antrum and follicular fluid. Biol. Reprod. 82 , 1021–1029 (2010).

Da Broi, M. G. et al. Influence of follicular fluid and cumulus cells on oocyte quality: clinical implications. J. Assist. Reprod. Genet. 35 , 735–751 (2018).

El-Hayek, S., Yang, Q., Abbassi, L., FitzHarris, G. & Clarke, H. J. Mammalian oocytes locally remodel follicular architecture to provide the foundation for germline–soma communication. Curr. Biol. 28 , 1124–1131 e1123 (2018).

Simon, A. M., Goodenough, D. A., Li, E. & Paul, D. L. Female infertility in mice lacking connexin 37. Nature 385 , 525–529 (1997).

Carabatsos, M. J., Sellitto, C., Goodenough, D. A. & Albertini, D. F. Oocyte–granulosa cell heterologous gap junctions are required for the coordination of nuclear and cytoplasmic meiotic competence. Dev. Biol. 226 , 167–179 (2000).

Sifer, C. et al. Could induced apoptosis of human granulosa cells predict in vitro fertilization-embryo transfer outcome? A preliminary study of 25 women. Eur. J. Obstet. Gynecol. Reprod. Biol. 103 , 150–153 (2002).

Babayev, E. et al. Cumulus expansion is impaired with advanced reproductive age due to loss of matrix integrity and reduced hyaluronan. Aging Cell 22 , e14004 (2023).

Esbert, M. et al. Addition of rapamycin or co-culture with cumulus cells from younger reproductive age women does not improve rescue in vitro oocyte maturation or euploidy rates in older reproductive age women. Mol. Hum. Reprod. 30 , gaad048 (2024).

Babayev, E. & Duncan, F. E. Age-associated changes in cumulus cells and follicular fluid: the local oocyte microenvironment as a determinant of gamete quality. Biol. Reprod. 106 , 351–365 (2022).

Xiao, S. et al. Doxorubicin has dose-dependent toxicity on mouse ovarian follicle development, hormone secretion, and oocyte maturation. Toxicol. Sci. 157 , 320–329 (2017).

West, E. R., Xu, M., Woodruff, T. K. & Shea, L. D. Physical properties of alginate hydrogels and their effects on in vitro follicle development. Biomaterials 28 , 4439–4448 (2007).

Xu, M., West, E., Shea, L. D. & Woodruff, T. K. Identification of a stage-specific permissive in vitro culture environment for follicle growth and oocyte development. Biol. Reprod. 75 , 916–923 (2006).

Shikanov, A., Xu, M., Woodruff, T. K. & Shea, L. D. Interpenetrating fibrin–alginate matrices for in vitro ovarian follicle development. Biomaterials 30 , 5476–5485 (2009).

Oakberg, E. F. Follicular growth and atresia in the mouse. In Vitro 15 , 41–49 (1979).

Zhou, J., Peng, X. & Mei, S. Autophagy in ovarian follicular development and atresia. Int. J. Biol. Sci. 15 , 726–737 (2019).

Lane, M. & Gardner, D. K. Differential regulation of mouse embryo development and viability by amino acids. J. Reprod. Fertil. 109 , 153–164 (1997).

Alam, M. H. & Miyano, T. Interaction between growing oocytes and granulosa cells in vitro. Reprod. Med. Biol. 19 , 13–23 (2020).

Muzumdar, M. D., Tasic, B., Miyamichi, K., Li, L. & Luo, L. A global double-fluorescent Cre reporter mouse. Genesis 45 , 593–605 (2007).

Abbassi, L. et al. Epidermal growth factor receptor signaling uncouples germ cells from the somatic follicular compartment at ovulation. Nat. Commun. 12 , 1438 (2021).

Prossnitz, E. R. & Barton, M. The G-protein-coupled estrogen receptor GPER in health and disease. Nat. Rev. Endocrinol. 7 , 715–726 (2011).

Fan, H. Y. et al. MAPK3/1 (ERK1/2) in ovarian granulosa cells are essential for female fertility. Science 324 , 938–941 (2009).

Dumollard, R., Duchen, M. & Carroll, J. The role of mitochondrial function in the oocyte and embryo. Curr. Top. Dev. Biol. 77 , 21–49 (2007).

Tarin, J. J., Perez-Albala, S. & Cano, A. Cellular and morphological traits of oocytes retrieved from aging mice after exogenous ovarian stimulation. Biol. Reprod. 65 , 141–150 (2001).

Mobarak, H. et al. Autologous mitochondrial microinjection; a strategy to improve the oocyte quality and subsequent reproductive outcome during aging. Cell Biosci. 9 , 95 (2019).

Morimoto, Y. et al. Mitochondrial transfer into human oocytes improved embryo quality and clinical outcomes in recurrent pregnancy failure cases. Int. J. Mol. Sci. 24 , 2738 (2023).

Zhang, Q. et al. Supplementation of mitochondria from endometrial mesenchymal stem cells improves oocyte quality in aged mice. Cell Prolif. 56 , e13372 (2023).

Zhang, X. et al. Enhancing mitochondrial proteolysis alleviates alpha-synuclein-mediated cellular toxicity. NPJ Parkinsons Dis. 10 , 120 (2024).

Hassold, T. & Hunt, P. To err (meiotically) is human: the genesis of human aneuploidy. Nat. Rev. Genet. 2 , 280–291 (2001).

Sakakibara, Y. et al. Bivalent separation into univalents precedes age-related meiosis I errors in oocytes. Nat. Commun. 6 , 7550 (2015).

Mihajlovic, A. I., Haverfield, J. & FitzHarris, G. Distinct classes of lagging chromosome underpin age-related oocyte aneuploidy in mouse. Dev. Cell 56 , 2273–2283 e2273 (2021).

Chatzidaki, E. E. et al. Ovulation suppression protects against chromosomal abnormalities in mouse eggs at advanced maternal age. Curr. Biol. 31 , 4038–4051 e4037 (2021).

Chiang, T., Duncan, F. E., Schindler, K., Schultz, R. M. & Lampson, M. A. Evidence that weakened centromere cohesion is a leading cause of age-related aneuploidy in oocytes. Curr. Biol. 20 , 1522–1528 (2010).

Lister, L. M. et al. Age-related meiotic segregation errors in mammalian oocytes are preceded by depletion of cohesin and Sgo2. Curr. Biol. 20 , 1511–1521 (2010).

Zielinska, A. P., Holubcova, Z., Blayney, M., Elder, K. & Schuh, M. Sister kinetochore splitting and precocious disintegration of bivalents could explain the maternal age effect. eLife 4 , e11389 (2015).

Zielinska, A. P. et al. Meiotic kinetochores fragment into multiple lobes upon cohesin loss in aging eggs. Curr. Biol. 29 , 3749–3765 e3747 (2019).

Cecconi, S., Tatone, C., Buccione, R., Mangia, F. & Colonna, R. Granulosa cell-oocyte interactions: the phosphorylation of specific proteins in mouse oocytes at the germinal vesicle stage is dependent upon the differentiative state of companion somatic cells. J. Exp. Zool. 258 , 249–254 (1991).

Colonna, R., Cecconi, S., Tatone, C., Mangia, F. & Buccione, R. Somatic cell–oocyte interactions in mouse oogenesis: stage-specific regulation of mouse oocyte protein phosphorylation by granulosa cells. Dev. Biol. 133 , 305–308 (1989).

De La Fuente, R. & Eppig, J. J. Transcriptional activity of the mouse oocyte genome: companion granulosa cells modulate transcription and chromatin remodeling. Dev. Biol. 229 , 224–236 (2001).

Chan, C. C. et al. Mitochondrial DNA content and 4977 bp deletion in unfertilized oocytes. Mol. Hum. Reprod. 11 , 843–846 (2005).

Barritt, J. A., Cohen, J. & Brenner, C. A. Mitochondrial DNA point mutation in human oocytes is associated with maternal age. Reprod. Biomed. Online 1 , 96–100 (2000).

Miao, Y. L., Kikuchi, K., Sun, Q. Y. & Schatten, H. Oocyte aging: cellular and molecular changes, developmental potential and reversal possibility. Hum. Reprod. Update 15 , 573–585 (2009).

Sala, A. J. & Morimoto, R. I. Protecting the future: balancing proteostasis for reproduction. Trends Cell Biol. 32 , 202–215 (2022).

Peng, J. et al. Growth differentiation factor 9:bone morphogenetic protein 15 heterodimers are potent regulators of ovarian functions. Proc. Natl Acad. Sci. USA 110 , E776–E785 (2013).

Gilchrist, R. B., Ritter, L. J. & Armstrong, D. T. Oocyte–somatic cell interactions during follicle development in mammals. Anim. Reprod. Sci. 82–83 , 431–446 (2004).

Burkhardt, S. et al. Chromosome cohesion established by rec8-cohesin in fetal oocytes is maintained without detectable turnover in oocytes arrested for months in mice. Curr. Biol. 26 , 678–685 (2016).

Lai, Q. et al. Oxidative stress in granulosa cells contributes to poor oocyte quality and IVF-ET outcomes in women with polycystic ovary syndrome. Front. Med. 12 , 518–524 (2018).

Wang, S. et al. Single-cell transcriptomic atlas of primate ovarian aging. Cell 180 , 585–600 e519 (2020).

Amargant, F. et al. Ovarian stiffness increases with age in the mammalian ovary and depends on collagen and hyaluronan matrices. Aging Cell 19 , e13259 (2020).

Wang, H. et al. Symmetry breaking in hydrodynamic forces drives meiotic spindle rotation in mammalian oocytes. Sci. Adv. 6 , eaaz5004 (2020).

Wang, H. et al. Dual control of formin-nucleated actin assembly by the chromatin and ER in mouse oocytes. Curr. Biol. 32 , 4013–4024 e4016 (2022).

Rodriguez-Nuevo, A. et al. Oocytes maintain ROS-free mitochondrial metabolism by suppressing complex I. Nature 607 , 756–761 (2022).

Ben-Meir, A. et al. Coenzyme Q10 restores oocyte mitochondrial function and fertility during reproductive aging. Aging Cell 14 , 887–895 (2015).

Al-Zubaidi, U. et al. The spatio-temporal dynamics of mitochondrial membrane potential during oocyte maturation. Mol. Hum. Reprod. 25 , 695–705 (2019).

Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 , 550 (2014).

Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16 , 284–287 (2012).

Yu, G., Wang, L. G., Yan, G. R. & He, Q. Y. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31 , 608–609 (2015).

Tay, S. H. et al. Distinct transcriptomic and metabolomic profiles characterize NSAID-induced urticaria/angioedema patients undergoing aspirin desensitization. J. Allergy Clin. Immunol. 150 , 1486–1497 (2022).

Ng, M. L. et al. Deep phenotyping of oxidative stress in emergency room patients reveals homoarginine as a novel predictor of sepsis severity, length of hospital stay, and length of intensive care unit stay. Front. Med. 9 , 1033083 (2022).

Download references

Acknowledgements

We thank S. Xiao (Rutgers University) for the helpful discussion on mouse follicle in vitro culture method. We thank M.l Lampson (University of Pennsylvania) for providing Rec8 antibody. We thank T. S. Kitajima (RIKEN Center for Developmental Biology) for providing pGEMHE–2mEGFP–CENP-C plasmid. Graphics from Figs. 2a,f , 3a , 4a,e and 8g and Extended Data Figs. 3b , 6a and 7a were created with BioRender. This work was supported by a grant from the National University of Singapore Bia-Echo Asia Centre for Reproductive Longevity and Equality and by the National Research Foundation, Singapore, under its mid-sized grant (NRF-MSG-2023-0001) to R.L. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and affiliations.

Mechanobiology Institute, National University of Singapore, Singapore, Singapore

HaiYang Wang, Xingyu Shen, Yaelim Lee, XinJie Song, Chang Shu, Jin Zhu & Rong Li

NUS Bia Echo Asia Centre for Reproductive Longevity and Equality, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

Zhongwei Huang

Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

Cardiovascular Research Institute, National University Health System, Singapore, Singapore

Lik Hang Wu, Leroy Sivappiragasam Pakkiri, Poh Leong Lim & Chester Lee Drum

Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

Department of Pharmacy, Faculty of Science, National University of Singapore, Singapore, Singapore

Lik Hang Wu

Center for Cell Dynamics and Department of Cell Biology, Johns Hopkins University School of Medicine, Baltimore, MD, USA

Xi Zhang & Rong Li

Department of Biological Sciences, National University of Singapore, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

Contributions

H.W. and R.L. conceived the study. H.W. and R.L. designed the experiments and methods for data analysis. H.W. performed experiments and analyzed the data with assistance from Z.H., X.J.S., X.S. and C.S., with the following exceptions: L.H.W., P.L.L. and L.S.P. performed the MS experiments and data analysis; C.S. measured the distance between sister kinetochores; X.Z. generated MTS–mCherry–GFP 1–10 mice strain; J.Z. supervised the RNA-seq experiments and analyzed the data with Y.L.; and C.L.D. and L.S.P. supervised the MS analysis. H.W. and R.L. wrote the paper and prepared the figures with input from all authors. R.L. supervised the study.

Corresponding authors

Correspondence to HaiYang Wang or Rong Li .

Ethics declarations

Competing interests.

We disclose that we have filed a patent for this study. The applicants and inventors for this patent are R.L. and H.W. The patent application, titled ‘Somatic Cell-Based Therapy to Treat Female Infertility’, was filed under number PCT/SG2023/050339 and has been published with the publication number WO 2023/224556 A1. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Aging thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 follicles accumulate age-related abnormalities..

a,b , Representative images of Ki-67 staining in ovarian sections ( a ). F-actin was stained with phalloidin. Scale bar, 50 μm. Quantitative analysis of the percentage of Ki-67-positive cells per follicle is shown in ( b) . n = 31 (young), 25 (aged) follicles. c , Quantification of γH2AX foci in GCs from follicles in ovarian sections. n = 37 (young), 38 (aged) follicles. d-f , CM-H2DCFDA staining in isolated oocyte-GC complexes ( d ). Scale bar, 30 μm. Scatter plots ( e ) show the correlation between ROS levels in GCs and oocytes (simple linear regression and two-tailed analysis). Gray areas around fit lines indicate 95% confidence intervals, with Pearson’s correlation coefficient (r). Comparison of ROS intensity in young and aged oocytes or GCs is shown in ( f ). n = 76 (young), 43 (aged). 2-month-old (young) and 14-month-old (aged) mice were used in ( b, c, e, f ). Box plots in (b, c, f) show mean (black square), median (center line), quartiles (box limits), and 1.5× interquartile range (whiskers). Box plots inside the violins in ( f ) show mean (black circle), quartiles (box limits), and 1.5× interquartile range (whiskers). Two-tailed unpaired t-tests for ( b, c, f ). P value: **** P  < 0.0001, *** P  < 0.001. Exact P values are in the Source Data. Data are from at least three independent experiments.

Source data

Extended data fig. 2 comparison of in vivo and in vitro grown oocytes..

a , Diameter of oocytes grown in vivo or in vitro . n = 109 ( in vivo ), 90 ( in vitro ). b , Quantification of oocyte maturation rate. Sample sizes: n = 126 ( in vivo ) and n = 98 ( in vitro ) oocytes, with 4 biological replicates in each group. Data are shown as mean ± SD. c , Analysis of embryo development potential. n = 83 ( in vivo ) and 75 ( in vitro ). d-f , Transcriptome analysis of oocytes grown in vitro and in vivo . Volcano plot ( d) of DEGs (p.adjust < 0.05 and log 2 fold change > 0.5 or < −0.5) between in vitro and in vivo oocytes. Two-sided Wald-test adjusted with Benjamini-Hochberg method. Correlation heatmap ( e) with hierarchical clustering to show the sample-to-sample distances. PCA analysis (f) of the normalized gene expression data. Ellipses fit a multivariate t-distribution at confidence level of 0.8. n = 8 in vivo and 8 in vitro . g , Dot plots illustrating follicle size changes over time during 3D ex vivo culture. Color bar and circle size represent follicle size. 2-month-old (young, n = 18) and 14-month-old (aged, n = 18) mice were used. h,i , Representative images ( h ) of 3D ex vivo cultured young and aged follicles. Follicles were considered atretic if there was disruption of contact between the oocyte (red asterisk) and GCs, leading to the release of oocytes from the follicles (bottom left), or if the follicles contained apoptotic or dead oocytes (bottom right). Antrum is indicated by the white arrowhead. Scale bar, 100 μm. Atresia rate was quantified (i) in young (2-3 months) and aged (14-15 months) follicles after 3D ex vivo culture. The median is represented by the center line, with individual dots representing biological replicates for each group. Sample sizes: n = 166 (young), 199 (aged) follicles, with 5 biological replicates in each group. 2-3 month-old mice were used in ( a - f ). Box plots inside the violins in ( a ) show mean (black circle), quartiles (box limits), and 1.5× interquartile range (whiskers). Two-tailed unpaired t-tests for ( a, b, i ). Two-tailed Fisher’s exact test for (c) . P value: ** P  < 0.01, ns, not significant ( P  > 0.05). Exact P values are in the Source Data. All data are from at least three independent experiments.

Extended Data Fig. 3 Growth and maturation of oocytes from RCFs in 3D ex vivo culture.

a , Procedure for generating reconstituted chimeric follicles. Red arrow points to the oocyte used for transplantation. Red asterisk indicates the oocyte within the r-follicle that will be replaced. Refer to Supplementary Video 1 and Methods for further details. b , To distinguish between the donor oocyte and the r-follicle, we employed oocytes from mTmG transgenic mice exhibiting membrane-localized tdTomato (pseudo-colored yellow). In contrast, the r-follicles were sourced from non-fluorescent wild-type mice. The mTmG oocytes served as donors as referenced in Fig. 2g and Extended Data Fig. 3c–e . c , RCF size increased during 3D ex vivo culture. Oocytes from transgenic mTmG mice and follicular somatic cells from wild-type mice, as shown in ( b ). Scale bars, 50 μm. d , Cumulus-oocyte complexes (COCs) isolated from antral RCFs were induced for oocyte maturation with hCG for 16 hours in vitro . Note that cumulus cells surrounding the oocytes (from mTmG mice) expanded, and oocytes resumed meiosis, extruded the PB1 as shown in ( e ). Scale bars, 200 μm. e , Representative image of mature eggs derived from RCFs as shown in ( c and d ). The cumulus cells were removed after maturation to visualize mature eggs with the first polar body (PB1, arrows). Scale bars, 40 μm. All images are representative of at least three independent experiments.

Extended Data Fig. 4 Aged follicular somatic cells elevate ROS levels and reduce mitochondrial membrane potential in young oocytes.

a . Representative confocal images of cellular ROS stained with CM-H2DCFDA in oocytes from YY and YA RCFs. Scale bar, 100 μm. b . Quantification of CM-H2DCFDA fluorescence intensity in oocytes from YY and YA RCFs, as well as Y. n = 104 (Y), 122 (YY), 97 (YA). 2-month-old (young) and 14-month-old (aged) wide-type ICR mice were used. c . Fluorescence images of oocyte stained with MitoTracker Green (MTG, cyan) and mitochondrial membrane potential-sensitive dye TMRM (red). Scale bar, 100 μm. d . Quantification of the fluorescence intensity ratio of TMRM to MTG in oocytes from YY and YA RCFs, as well as Y. n = 110 (Y), 94 (YY), 72 (YA). 2-month-old (young) and 14-month-old (aged) wide-type ICR mice were used. Box plots in (b, d) show mean (black square), median (center line), quartiles (box limits), and 1.5× interquartile range (whiskers). One-way ANOVA, Tukey’s multiple comparisons test for ( b , d ). P value: **** P  < 0.0001, ns, not significant ( P  > 0.05). The exact P values are presented in the Source Data. All data are from at least three independent experiments.

Extended Data Fig. 5 Impact of young follicular somatic cells on aged oocyte quality.

a , Quantification of oocyte death rates. Data are presented as mean ± SD. Sample sizes: n = 77 (YY), 65 (AA), 142 (AY) oocytes. Individual dots represent biological replicates for each group. Young: 2-3 months old, aged: 14-15 months old. b-e , Representative live-cell images ( b ) showing spindle and chromosomes in MII oocytes. Scale bar, 10 µm. Quantification of the percentage of chromosomal misalignment ( c ) and spindle abnormalities ( d ). Panel ( e ) presents a separate quantitative analysis of various classes of spindle abnormalities. Young: 2-3 months old, aged: 14-17 months old. f,g , Representative image of DAPI-stained blastocysts ( f ). Scale bars, 20 μm. Cell numbers per blastocyst were quantified in (g) . n = 57 (YY), 43 (AY), 26 (AA). h-m , Comparison of various parameters between AA and AY RCFs and A: ( h ) cellular ROS levels, ( i ) oocyte maturation rates, ( j ) chromosomal misalignment, ( k ) spindle abnormalities, ( l ) blastocyst formation rate, and ( m ) blastocyst size. For ( h ), n = 72 (A), 72 (AA), 72 (AY); for ( m ), n = 25 (A), 28 (AA), 31 (AY). In ( c, d, i, j, k, l ), the oocyte numbers are specified in brackets. 2-month-old (young) and 14-month-old (aged) mice were used in ( g-m ). Box plots in ( g, m ) show mean (black square), median (center line), quartiles (box limits), and 1.5× interquartile range (whiskers). Box plots inside the violins in ( h ) show mean (black circle), quartiles (box limits), and 1.5× interquartile range (whiskers). One-way ANOVA with Tukey’s multiple comparisons test was used for ( a, g, h, m ). Two-tailed Fisher’s exact test for ( c, d, i-l ). P value: **** P  < 0.0001, *** P  < 0.001, ** P  < 0.01, * P  < 0.05, ns, not significant ( P  > 0.05). Exact P values are in the Source Data. All data are from at least three independent experiments.

Extended Data Fig. 6 TZP regeneration and oocytes transcriptomic remodeling in RCFs.

a , Schematic demonstrating TZPs from GCs that pass through the zona pellucida, forming either adherens junctions or gap junctions on the oocyte surface. b , TZP regenerated within 3 hours of RCF culturing. RCF containing follicular somatic cells from mTmG mouse and wild-type oocytes were cultured within Alginate-rBM beads for 3 h. Somatic cells were then removed to visualize TZP regeneration. Scale bars, 20 μm. c , Histogram displays the number of up-regulated or down-regulated DEGs between oocytes from YY and AA, AY and AA, or YY and AY RCFs. d,e , Representative GO terms associated with the genes that were downregulated ( d ) and upregulated ( e ) in aged oocytes from AA RCFs when compared to young oocytes from YY RCFs. One-sided hypergeometric test with FDR adjustment for multiple comparisons.

Extended Data Fig. 7 Investigating possible GC-to-oocyte mitochondrial transport in RCFs.

a . Experimental design to study mitochondrial transport within RCFs. RCFs were created using somatic cells from transgenic MTS-mCherry-GFP 1-11 mice, which express mitochondria-targeted mCherry, and unlabelled oocytes from wild-type mice. b . Confocal microscopy images of mCherry-labelled mitochondria in oocytes. Top panel: Positive control, an RCF formed by transplanting an MTS-mCherry-GFP 1-11 oocyte into an MTS-mCherry-GFP 1-10 r-follicle. Middle panel: RCF generated by transplanting a wild-type oocyte into an MTS-mCherry-GFP 1-10 r-follicle. Bottom panel: Negative control, an RCF generated by transplanting a wild-type oocyte into a wild-type r-follicle. Rightmost panel of each row: overexposed images corresponding to the second column (mCherry). Somatic cells were partially removed before imaging to better observe oocyte fluorescence. Scale bar, 20 µm. All images are representative of at least three independent experiments.

Extended Data Fig. 8 Comparative analysis of oocytes from YY, YA, AY, AA RCFs.

This analysis examines various parameters of oocyte quality and developmental potential across four different RCF groups (YY, YA, AY, AA): ( a ) oocyte maturation rates (n = 12 YY, 4 YA, 10 AY, 9 AA), ( b ) chromosome misalignment (n = 9 YY, 4 YA, 5 AY, 5 AA), ( c ) spindle abnormalities (n = 9 YY, 4 YA, 5 AY, 5 AA), ( d ) blastocyst formation rates (n = 9 YY, 5 YA, 7 AY, 7 AA), ( e ) cellular ROS accumulation (n = 150 YY, 97 YA, 38 AY, 26 AA), ( f ) mitochondrial membrane potential (n = 153 YY, 72 YA, 57 AY, 53 AA). All metrics were normalized to those of the YY group in the same experiments using the non-normalized data as in Figs. 2h, j, k, l , 3b, d , 6f, j , Extended Data Fig. 4 b, d, and 5c, d . The data were analyzed by one-way ANOVA, Tukey’s multiple comparisons test. P value: **** P  < 0.0001, *** P  < 0.001, ** P  < 0.01, * P  < 0.05, ns, not significant ( P  > 0.05). The exact P values are presented in the Source Data. All results are presented as mean ± SD. All data are from at least three independent experiments.

Supplementary information

Reporting summary, supplementary video 1.

Chimeric follicle generation process. This video demonstrates a step-by-step example of creating a RCF, highlighting the process of transplanting an oocyte into an r-follicle.

Supplementary Video 2

An example of sister kinetochore pair distance measurement. This video demonstrates the measurement of sister kinetochore pair distances in oocytes expressing 2mEGFP–CENP-C (green) and H2B–mCherry (red) to label kinetochores and chromosomes, respectively. See Methods for a detailed description of the measurement protocol.

Supplementary Tables 1–5

Supplementary Table 1. Differential gene expression analysis in vitro oocytes versus in vivo oocytes. Two-sided Wald test adjusted with the Benjamini–Hochberg method. Supplementary Table 2. Differential gene expression analysis in oocytes from AA RCFs versus YY RCFs. Two-sided Wald test adjusted with the Benjamini–Hochberg method. Supplementary Table 3. Differential gene expression analysis in oocytes from AA RCFs versus AY RCFs. Two-sided Wald test adjusted with the Benjamini–Hochberg method. Supplementary Table 4. Differential gene expression analysis in oocytes from AY RCFs versus AA RCFs. Two-sided Wald test adjusted with the Benjamini–Hochberg method. Supplementary Table 5. Metabolomic profiling of oocytes from YY, AA, and AY RCFs. Two-sided Wald test.

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Source data fig. 3, source data fig. 4, source data fig. 6, source data fig. 7, source data fig. 8, source data extended data fig. 1, source data extended data fig. 2, source data extended data fig. 4, source data extended data fig. 5, source data extended data fig. 6, source data extended data fig. 8, rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Wang, H., Huang, Z., Shen, X. et al. Rejuvenation of aged oocyte through exposure to young follicular microenvironment. Nat Aging (2024). https://doi.org/10.1038/s43587-024-00697-x

Download citation

Received : 23 April 2024

Accepted : 30 July 2024

Published : 09 September 2024

DOI : https://doi.org/10.1038/s43587-024-00697-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

experimental design anova

IMAGES

  1. Experimental Designs; Latin Square Design; N-Way ANOVA; Multifactor; Multifactorial ANOVA

    experimental design anova

  2. Experimental design, ANOVA, and regression by Richard A. Damon

    experimental design anova

  3. 8 Experimental design Anova

    experimental design anova

  4. ANOVA (analysis of variance) of responses of experimental design for

    experimental design anova

  5. ANOVA for fractional factorial experimental design

    experimental design anova

  6. Experimental Designs; CRD; Completely Randomized Design; One-Way ANOVA

    experimental design anova

VIDEO

  1. Using ANOVA to Compare the Means of Two or More Populations

  2. ما هي طريقة تاقوشي؟ What is taguchi (robust design)method?

  3. Analysis of Variance (ANOVA): Example

  4. Design of Experiments (DOE) Tutorial for Beginners

  5. Examples on one-way ANOVA# Modelquestionpaper example#2022 scheme#BMATS301#MTECH CS#

  6. Class 1: introduction to research data analysis

COMMENTS

  1. Experimental Design for ANOVA

    Experimental Design for ANOVA. There is a close relationship between experimental design and statistical analysis. The way that an experiment is designed determines the types of analyses that can be appropriately conducted. In this lesson, we review aspects of experimental design that a researcher must understand in order to properly interpret experimental data with analysis of variance.

  2. The Ultimate Guide to ANOVA

    The Ultimate Guide to ANOVA. ANOVA is the go-to analysis tool for classical experimental design, which forms the backbone of scientific research. In this article, we'll guide you through what ANOVA is, how to determine which version to use to evaluate your particular experiment, and provide detailed examples for the most common forms of ANOVA.

  3. PDF ANOVA and experimental design

    Experimental design. Another important topic that tends to be tied to ANOVA models is the issue of experimental design In controlled experiments, the most important statistical consideration is often the design and e ciency of the experiment For example, the Pj = 0 constraint is most sensible if. j the number of observations in each group are ...

  4. Guide to Experimental Design

    An experimental design where treatments aren't randomly assigned is called a quasi-experimental design. Between-subjects vs. within-subjects. In a between-subjects design (also known as an independent measures design or classic ANOVA design), individuals receive only one of the possible levels of an experimental treatment.

  5. ANOVA and Experimental Design

    There are 4 modules in this course. This second course in statistical modeling will introduce students to the study of the analysis of variance (ANOVA), analysis of covariance (ANCOVA), and experimental design. ANOVA and ANCOVA, presented as a type of linear regression model, will provide the mathematical basis for designing experiments for ...

  6. PDF Chapter 4 Experimental Designs and Their Analysis

    Experimental Designs and Their Analysis. Design of experiment means how to design an experiment in the sense that how the observations or measurements should be obtained to answer a query in a valid, efficient and economical way. The designing of the experiment and the analysis of obtained data are inseparable.

  7. 1: Overview of ANOVA

    A designed experiment provides this through replication and generates data that requires the calculation of mean (average) responses. The working hypothesis, boxplots, and means plots. Explanation of the 7 steps for statistical hypothesis testing. 1: Overview of ANOVA. Overview of analysis of variance (ANOVA) and experimental design concepts.

  8. ANOVA Tutorial

    This tutorial explains how to use analysis of variance to assess the effect of one or more factors on a single interval- or ratio-scale variable. The tutorial focuses on four classic experimental designs: Single-factor independent groups design. Full factorial independent groups design. Randomized block design with independent groups.

  9. Experimental design and analysis and their reporting: new guidance for

    Finally, in studies with complex experimental designs involving factors other than one experimental factor 2-way or 3-way anova, it is important to conduct post hoc tests between types of data (e.g. between doses or between genotypes) only where the anova indicates there is a source of variance.

  10. One-way ANOVA

    ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. ... Guide to Experimental Design | Overview, Steps, & Examples Experimental design is the process of planning an experiment to test a hypothesis. The choices you make affect the validity of your results.

  11. 15.2: ANOVA Designs

    This page titled 15.2: ANOVA Designs is shared under a Public Domain license and was authored, remixed, and/or curated by David Lane via source content that was edited to the style and standards of the LibreTexts platform. There are many types of experimental designs that can be analyzed by ANOVA. This section discusses many of these designs ...

  12. 14: ANOVA Designs, Multiple Factors

    The subject of this chapter is the introduction to two-way ANOVA designs. In fact, to many, ANOVA design is practically synonymous to a statistician when they think about experimental design (Lindman 1992; Quinn and Keough 2002). As noted by Quinn and Keough (2002) in the preface to their book, "… many biological hypotheses, even ...

  13. ANOVA Articles

    Analysis of variance (ANOVA) assesses the differences between group means. It is a statistical hypothesis test that determines whether the means of at least two populations are different. At a minimum, you need a continuous dependent variable and a categorical independent variable that divides your data into comparison groups to perform ANOVA.

  14. ANOVA With Full Factorial Experiments

    For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable. Fixed Effects For example, here is the fixed-effects mathematical model for a two-factor, completely randomized, full-factorial experiment:

  15. Experimental design tab: One-way ANOVA

    This is sometimes called a randomized block experimental design. If each row of data represents a single subject given successive treatments, then you have a repeated measures experimental design. The assumption of sphericity is unlikely to be an issue if the order of treatments is randomized for each subject, so one subject gets treatments A ...

  16. Introduction to experimental design and analysis of variance (ANOVA

    Covers introduction to design of experiments. Topics00:00 Introduction01:03 What is design of experiments (DOE)? Examples05:09 DOE objectives08:15 Seven step...

  17. ANOVA and Experimental Design

    This second course in statistical modeling will introduce students to the study of the analysis of variance (ANOVA), analysis of covariance (ANCOVA), and experimental design. ANOVA and ANCOVA, presented as a type of linear regression model, will provide the mathematical basis for designing experiments for data science applications.

  18. The one-way ANOVA model

    The one-way ANOVA model. The statistical model can be written as in the equation above.On the left hand side of the equation we have the response variable (y ij) that identifies each observation with the corresponding treatment and replication. i =1… t, t is the number of treatment levels. j =1… nt, nt is the number of replications of ...

  19. The Experimental Design Assistant

    The Experimental Design Assistant. A free resource from the NC3Rs used by over 5,000 researchers worldwide to help you design robust experiments more likely to yield reliable and reproducible results. The EDA helps you build a diagram representing your experimental plan, which can be critiqued by the system to provide bespoke feedback.

  20. (PDF) Experimental Designs Using ANOVA

    PDF | On Jan 1, 2007, Barbara G. Tabachnick published Experimental Designs Using ANOVA | Find, read and cite all the research you need on ResearchGate

  21. What is ANOVA?

    How ANOVA Techniques Are Different. Analysis of variance and experimental design are intertwined. The procedures used to implement analysis of variance depend on experimental design attributes, such as the following: The number of independent variables under investigation.

  22. Repeated Measures Designs: Benefits and an ANOVA Example

    Repeated Measures Designs: Benefits and an ANOVA Example. Repeated measures designs, also known as a within-subjects designs, can seem like oddball experiments. When you think of a typical experiment, you probably picture an experimental design that uses mutually exclusive, independent groups. These experiments have a control group and ...

  23. Beyond t-Test and ANOVA: applications of mixed-effects models for more

    The appropriate use of mixed-effects models will help researchers improve their experimental design, and will lead to data analyses with greater validity and higher reproducibility of the experimental findings. Keywords: clustered ... ANOVA or t-test showed that Ca++ activity was significantly reduced at 48h relative to 24h with p=4.8×10 ...

  24. Experimental Design for Repeated Measures

    A repeated measures design is a type of randomized block design. It is a randomized block design in which each experimental unit serves as a blocking variable. Consider a single-factor experiment - one independent variable and one dependent variable. If the independent variable has k treatment levels, a repeated measures design requires k ...

  25. Rejuvenation of aged oocyte through exposure to young follicular

    A one-way ANOVA, Tukey's multiple comparisons test was performed for d, f and g and two-tailed Fisher's exact test for b. ... Experimental design to study mitochondrial transport within RCFs.