Experiments with Google

A note to our community

Experiments with Google was born out of a simple idea, but you all turned it into something beyond anything we could have ever imagined. You filled it with thousands of experiments that inspired people everywhere - from the classroom to the surface of Mars.

When it comes to the internet, 14 years is a long time. So in the spirit of experimentation we’re trying something new.

This site will continue as a rich archival gallery for all existing experiments. But the action will live on at labs.google, a new place filled with new tools and toys for you to play with. And together we can continue to experiment with the future of technology.

public experiments

Since 2009, coders have created thousands of amazing experiments using Chrome, Android, AI, AR and more. We're showcasing projects here, along with helpful tools and resources, to inspire others to create new experiments. Here are collections of experiments to explore, with new ones added every week. Have fun.

Featured Collections

View all collections.

WebXR Experiments

AR and VR made for the web

AI Experiments

Celebrating Creativity and AI

Arts & Culture Experiments

See what happens at the crossroads of art and technology

Experiments for Learning

A collection of experiments that teachers, students, and families are using to learn from home.

Start With One

A collection of experiments that started by working with one person to make something impactful for them and their community

Chrome Experiments

Creative code for the web

Recent Experiments

View all experiments, passage of water, instrument playground, cultural icons, say what you see, don’t touch the art, what's happening.

CERN Accelerating science

home

Experiments

A range of experiments at CERN investigate physics from cosmic rays to supersymmetry

CMS experiment

Diverse experiments at CERN

CERN is home to a wide range of experiments. Scientists from institutes all over the world form experimental collaborations to carry out a diverse research programme , ensuring that CERN covers a wealth of topics in physics, from the Standard Model to supersymmetry and from exotic isotopes to cosmic rays .

Several collaborations run experiments using the Large Hadron Collider (LHC), the most powerful accelerator in the world. In addition, fixed-target experiments, antimatter experiments and experimental facilities make use of the LHC injector chain.

LHC experiments

Nine experiments at the Large Hadron Collider  (LHC) use detectors to analyse the myriad of particles produced by collisions in the accelerator . These experiments are run by collaborations of scientists from institutes all over the world. Each experiment is distinct and characterised by its detectors.

Large Hadron Collider,LHC,Magnets,Dipole,Work,Tunnel

The biggest of these experiments, ATLAS and CMS , use general-purpose detectors to investigate the largest range of physics possible. Having two independently designed detectors is vital for cross-confirmation of any new discoveries made.  ALICE and LHCb  have detectors specialised for focussing on specific phenomena. These four detectors sit underground in huge caverns on the LHC ring.

The smallest experiments on the LHC are  TOTEM  and  LHCf , which focus on "forward particles" – protons or heavy ions that brush past each other rather than meeting head on when the beams collide. TOTEM uses detectors positioned on either side of the CMS interaction point, while LHCf is made up of two detectors which sit along the LHC beamline, at 140 metres either side of the ATLAS collision point.  MoEDAL-MAPP uses detectors deployed near LHCb to search for a hypothetical particle called the magnetic monopole. FASER and SND@LHC , the two newest LHC experiments, are situated close to the ATLAS collision point in order to search for light new particles and to study neutrinos.

MoEDAL-MAPP

Fixed-target experiments.

In “fixed-target” experiments, a beam of accelerated particles is directed at a solid, liquid or gas target, which itself can be part of the detection system. 

COMPASS , which looks at the structure of hadrons – particles made of quarks – uses beams from the Super Proton Synchrotron (SPS).

The SPS also feeds the North Area (NA), which houses a number of experiments. NA61/SHINE studies a phase transition between hadrons and quark-gluon plasma, and conducts measurements for experiments involving cosmic rays and long-baseline neutrino oscillations. NA62 uses protons from the SPS to study rare decays of kaons. NA63 directs beams of electrons and positrons onto a variety of targets to study radiation processes in strong electromagnetic fields. NA64 is looking for new particles that would mediate an unknown interaction between visible matter and dark matter. NA65 studies the production of tau neutrinos. UA9 is investigating how crystals could help to steer particle beams in high-energy colliders.

The CLOUD experiment uses beams from the  Proton Synchrotron (PS) to investigate a possible link between cosmic rays and cloud formation. DIRAC , which is now analysing data, is investigating the strong force between quarks.

Antimatter experiments

Currently the Antiproton Decelerator and ELENA serve several experiments that are studying antimatter and its properties:  AEGIS, ALPHA ,  ASACUSA ,  BASE and  GBAR . PUMA is designed to carry antiprotons to ISOLDE . Earlier experiments ( ATHENA , ATRAP  and ACE ) are now completed.

Experimental facilities

Experimental facilities at CERN include ISOLDE , MEDICIS , the neutron time-of-flight facility (n_TOF) and the CERN Neutrino Platform .

CERN Neutrino Platform

Non-accelerator experiments.

Not all experiments rely on CERN’s accelerator complex. AMS , for example, is a CERN-recognised experiment located on the International Space Station, which has its control centre at CERN. The CAST and OSQAR experiments are both looking for hypothetical dark matter particles called axions.

Past experiments

CERN’s experimental programme has consisted of hundreds of experiments spanning decades.

Among these were pioneering experiments for electroweak physics, a branch of physics that unifies the electromagnetic and weak fundamental forces . In 1958, an experiment at the Synchrocyclotron discovered a rare pion decay that spread CERN’s name around the world. Then in 1973, the Gargamelle bubble chamber presented first direct evidence of the weak neutral current. Ten years later, CERN physicists working on the UA1 and UA2 detectors announced the discovery of the W boson in January and Z boson in June – the two carriers of the electroweak force. Two key scientists behind the discoveries – Carlo Rubbia and Simon van der Meer – received the Nobel prize in physics in 1984.

From 1989, the Large Electron-Positron collider (LEP) enabled the ALEPH , DELPHI , L3 and OPAL experiments to put the Standard Model of particle physics on a strong experimental basis. In 2000, LEP made way for the construction of the Large Hadron Collider (LHC) in the same tunnel.

CERN’s huge contributions to electroweak physics are just some of the highlights of the experiments over the years.

  • Grades 6-12
  • School Leaders

Win a $100 gift card each day this month! 🎁

72 Easy Science Experiments Using Materials You Already Have On Hand

Because science doesn’t have to be complicated.

Easy science experiments including a "naked" egg and "leakproof" bag

If there is one thing that is guaranteed to get your students excited, it’s a good science experiment! While some experiments require expensive lab equipment or dangerous chemicals, there are plenty of cool projects you can do with regular household items. We’ve rounded up a big collection of easy science experiments that anybody can try, and kids are going to love them!

Easy Chemistry Science Experiments

Easy physics science experiments, easy biology and environmental science experiments, easy engineering experiments and stem challenges.

Skittles form a circle around a plate. The colors are bleeding toward the center of the plate. (easy science experiments)

1. Taste the Rainbow

Teach your students about diffusion while creating a beautiful and tasty rainbow! Tip: Have extra Skittles on hand so your class can eat a few!

Learn more: Skittles Diffusion

Colorful rock candy on wooden sticks

2. Crystallize sweet treats

Crystal science experiments teach kids about supersaturated solutions. This one is easy to do at home, and the results are absolutely delicious!

Learn more: Candy Crystals

3. Make a volcano erupt

This classic experiment demonstrates a chemical reaction between baking soda (sodium bicarbonate) and vinegar (acetic acid), which produces carbon dioxide gas, water, and sodium acetate.

Learn more: Best Volcano Experiments

4. Make elephant toothpaste

This fun project uses yeast and a hydrogen peroxide solution to create overflowing “elephant toothpaste.” Tip: Add an extra fun layer by having kids create toothpaste wrappers for plastic bottles.

Girl making an enormous bubble with string and wire

5. Blow the biggest bubbles you can

Add a few simple ingredients to dish soap solution to create the largest bubbles you’ve ever seen! Kids learn about surface tension as they engineer these bubble-blowing wands.

Learn more: Giant Soap Bubbles

Plastic bag full of water with pencils stuck through it

6. Demonstrate the “magic” leakproof bag

All you need is a zip-top plastic bag, sharp pencils, and water to blow your kids’ minds. Once they’re suitably impressed, teach them how the “trick” works by explaining the chemistry of polymers.

Learn more: Leakproof Bag

Several apple slices are shown on a clear plate. There are cards that label what they have been immersed in (including salt water, sugar water, etc.) (easy science experiments)

7. Use apple slices to learn about oxidation

Have students make predictions about what will happen to apple slices when immersed in different liquids, then put those predictions to the test. Have them record their observations.

Learn more: Apple Oxidation

8. Float a marker man

Their eyes will pop out of their heads when you “levitate” a stick figure right off the table! This experiment works due to the insolubility of dry-erase marker ink in water, combined with the lighter density of the ink.

Learn more: Floating Marker Man

Mason jars stacked with their mouths together, with one color of water on the bottom and another color on top

9. Discover density with hot and cold water

There are a lot of easy science experiments you can do with density. This one is extremely simple, involving only hot and cold water and food coloring, but the visuals make it appealing and fun.

Learn more: Layered Water

Clear cylinder layered with various liquids in different colors

10. Layer more liquids

This density demo is a little more complicated, but the effects are spectacular. Slowly layer liquids like honey, dish soap, water, and rubbing alcohol in a glass. Kids will be amazed when the liquids float one on top of the other like magic (except it is really science).

Learn more: Layered Liquids

Giant carbon snake growing out of a tin pan full of sand

11. Grow a carbon sugar snake

Easy science experiments can still have impressive results! This eye-popping chemical reaction demonstration only requires simple supplies like sugar, baking soda, and sand.

Learn more: Carbon Sugar Snake

12. Mix up some slime

Tell kids you’re going to make slime at home, and watch their eyes light up! There are a variety of ways to make slime, so try a few different recipes to find the one you like best.

Two children are shown (without faces) bouncing balls on a white table

13. Make homemade bouncy balls

These homemade bouncy balls are easy to make since all you need is glue, food coloring, borax powder, cornstarch, and warm water. You’ll want to store them inside a container like a plastic egg because they will flatten out over time.

Learn more: Make Your Own Bouncy Balls

Pink sidewalk chalk stick sitting on a paper towel

14. Create eggshell chalk

Eggshells contain calcium, the same material that makes chalk. Grind them up and mix them with flour, water, and food coloring to make your very own sidewalk chalk.

Learn more: Eggshell Chalk

Science student holding a raw egg without a shell

15. Make naked eggs

This is so cool! Use vinegar to dissolve the calcium carbonate in an eggshell to discover the membrane underneath that holds the egg together. Then, use the “naked” egg for another easy science experiment that demonstrates osmosis .

Learn more: Naked Egg Experiment

16. Turn milk into plastic

This sounds a lot more complicated than it is, but don’t be afraid to give it a try. Use simple kitchen supplies to create plastic polymers from plain old milk. Sculpt them into cool shapes when you’re done!

Student using a series of test tubes filled with pink liquid

17. Test pH using cabbage

Teach kids about acids and bases without needing pH test strips! Simply boil some red cabbage and use the resulting water to test various substances—acids turn red and bases turn green.

Learn more: Cabbage pH

Pennies in small cups of liquid labeled coca cola, vinegar + salt, apple juice, water, catsup, and vinegar. Text reads Cleaning Coins Science Experiment. Step by step procedure and explanation.

18. Clean some old coins

Use common household items to make old oxidized coins clean and shiny again in this simple chemistry experiment. Ask kids to predict (hypothesize) which will work best, then expand the learning by doing some research to explain the results.

Learn more: Cleaning Coins

Glass bottle with bowl holding three eggs, small glass with matches sitting on a box of matches, and a yellow plastic straw, against a blue background

19. Pull an egg into a bottle

This classic easy science experiment never fails to delight. Use the power of air pressure to suck a hard-boiled egg into a jar, no hands required.

Learn more: Egg in a Bottle

20. Blow up a balloon (without blowing)

Chances are good you probably did easy science experiments like this when you were in school. The baking soda and vinegar balloon experiment demonstrates the reactions between acids and bases when you fill a bottle with vinegar and a balloon with baking soda.

21 Assemble a DIY lava lamp

This 1970s trend is back—as an easy science experiment! This activity combines acid-base reactions with density for a totally groovy result.

Four colored cups containing different liquids, with an egg in each

22. Explore how sugary drinks affect teeth

The calcium content of eggshells makes them a great stand-in for teeth. Use eggs to explore how soda and juice can stain teeth and wear down the enamel. Expand your learning by trying different toothpaste-and-toothbrush combinations to see how effective they are.

Learn more: Sugar and Teeth Experiment

23. Mummify a hot dog

If your kids are fascinated by the Egyptians, they’ll love learning to mummify a hot dog! No need for canopic jars , just grab some baking soda and get started.

24. Extinguish flames with carbon dioxide

This is a fiery twist on acid-base experiments. Light a candle and talk about what fire needs in order to survive. Then, create an acid-base reaction and “pour” the carbon dioxide to extinguish the flame. The CO2 gas acts like a liquid, suffocating the fire.

I Love You written in lemon juice on a piece of white paper, with lemon half and cotton swabs

25. Send secret messages with invisible ink

Turn your kids into secret agents! Write messages with a paintbrush dipped in lemon juice, then hold the paper over a heat source and watch the invisible become visible as oxidation goes to work.

Learn more: Invisible Ink

26. Create dancing popcorn

This is a fun version of the classic baking soda and vinegar experiment, perfect for the younger crowd. The bubbly mixture causes popcorn to dance around in the water.

Students looking surprised as foamy liquid shoots up out of diet soda bottles

27. Shoot a soda geyser sky-high

You’ve always wondered if this really works, so it’s time to find out for yourself! Kids will marvel at the chemical reaction that sends diet soda shooting high in the air when Mentos are added.

Learn more: Soda Explosion

Empty tea bags burning into ashes

28. Send a teabag flying

Hot air rises, and this experiment can prove it! You’ll want to supervise kids with fire, of course. For more safety, try this one outside.

Learn more: Flying Tea Bags

Magic Milk Experiment How to Plus Free Worksheet

29. Create magic milk

This fun and easy science experiment demonstrates principles related to surface tension, molecular interactions, and fluid dynamics.

Learn more: Magic Milk Experiment

Two side-by-side shots of an upside-down glass over a candle in a bowl of water, with water pulled up into the glass in the second picture

30. Watch the water rise

Learn about Charles’s Law with this simple experiment. As the candle burns, using up oxygen and heating the air in the glass, the water rises as if by magic.

Learn more: Rising Water

Glasses filled with colored water, with paper towels running from one to the next

31. Learn about capillary action

Kids will be amazed as they watch the colored water move from glass to glass, and you’ll love the easy and inexpensive setup. Gather some water, paper towels, and food coloring to teach the scientific magic of capillary action.

Learn more: Capillary Action

A pink balloon has a face drawn on it. It is hovering over a plate with salt and pepper on it

32. Give a balloon a beard

Equally educational and fun, this experiment will teach kids about static electricity using everyday materials. Kids will undoubtedly get a kick out of creating beards on their balloon person!

Learn more: Static Electricity

DIY compass made from a needle floating in water

33. Find your way with a DIY compass

Here’s an old classic that never fails to impress. Magnetize a needle, float it on the water’s surface, and it will always point north.

Learn more: DIY Compass

34. Crush a can using air pressure

Sure, it’s easy to crush a soda can with your bare hands, but what if you could do it without touching it at all? That’s the power of air pressure!

A large piece of cardboard has a white circle in the center with a pencil standing upright in the middle of the circle. Rocks are on all four corners holding it down.

35. Tell time using the sun

While people use clocks or even phones to tell time today, there was a time when a sundial was the best means to do that. Kids will certainly get a kick out of creating their own sundials using everyday materials like cardboard and pencils.

Learn more: Make Your Own Sundial

36. Launch a balloon rocket

Grab balloons, string, straws, and tape, and launch rockets to learn about the laws of motion.

Steel wool sitting in an aluminum tray. The steel wool appears to be on fire.

37. Make sparks with steel wool

All you need is steel wool and a 9-volt battery to perform this science demo that’s bound to make their eyes light up! Kids learn about chain reactions, chemical changes, and more.

Learn more: Steel Wool Electricity

38. Levitate a Ping-Pong ball

Kids will get a kick out of this experiment, which is really all about Bernoulli’s principle. You only need plastic bottles, bendy straws, and Ping-Pong balls to make the science magic happen.

Colored water in a vortex in a plastic bottle

39. Whip up a tornado in a bottle

There are plenty of versions of this classic experiment out there, but we love this one because it sparkles! Kids learn about a vortex and what it takes to create one.

Learn more: Tornado in a Bottle

Homemade barometer using a tin can, rubber band, and ruler

40. Monitor air pressure with a DIY barometer

This simple but effective DIY science project teaches kids about air pressure and meteorology. They’ll have fun tracking and predicting the weather with their very own barometer.

Learn more: DIY Barometer

A child holds up a pice of ice to their eye as if it is a magnifying glass. (easy science experiments)

41. Peer through an ice magnifying glass

Students will certainly get a thrill out of seeing how an everyday object like a piece of ice can be used as a magnifying glass. Be sure to use purified or distilled water since tap water will have impurities in it that will cause distortion.

Learn more: Ice Magnifying Glass

Piece of twine stuck to an ice cube

42. String up some sticky ice

Can you lift an ice cube using just a piece of string? This quick experiment teaches you how. Use a little salt to melt the ice and then refreeze the ice with the string attached.

Learn more: Sticky Ice

Drawing of a hand with the thumb up and a glass of water

43. “Flip” a drawing with water

Light refraction causes some really cool effects, and there are multiple easy science experiments you can do with it. This one uses refraction to “flip” a drawing; you can also try the famous “disappearing penny” trick .

Learn more: Light Refraction With Water

44. Color some flowers

We love how simple this project is to re-create since all you’ll need are some white carnations, food coloring, glasses, and water. The end result is just so beautiful!

Square dish filled with water and glitter, showing how a drop of dish soap repels the glitter

45. Use glitter to fight germs

Everyone knows that glitter is just like germs—it gets everywhere and is so hard to get rid of! Use that to your advantage and show kids how soap fights glitter and germs.

Learn more: Glitter Germs

Plastic bag with clouds and sun drawn on it, with a small amount of blue liquid at the bottom

46. Re-create the water cycle in a bag

You can do so many easy science experiments with a simple zip-top bag. Fill one partway with water and set it on a sunny windowsill to see how the water evaporates up and eventually “rains” down.

Learn more: Water Cycle

Plastic zipper bag tied around leaves on a tree

47. Learn about plant transpiration

Your backyard is a terrific place for easy science experiments. Grab a plastic bag and rubber band to learn how plants get rid of excess water they don’t need, a process known as transpiration.

Learn more: Plant Transpiration

Students sit around a table that has a tin pan filled with blue liquid wiht a feather floating in it (easy science experiments)

48. Clean up an oil spill

Before conducting this experiment, teach your students about engineers who solve environmental problems like oil spills. Then, have your students use provided materials to clean the oil spill from their oceans.

Learn more: Oil Spill

Sixth grade student holding model lungs and diaphragm made from a plastic bottle, duct tape, and balloons

49. Construct a pair of model lungs

Kids get a better understanding of the respiratory system when they build model lungs using a plastic water bottle and some balloons. You can modify the experiment to demonstrate the effects of smoking too.

Learn more: Model Lungs

Child pouring vinegar over a large rock in a bowl

50. Experiment with limestone rocks

Kids  love to collect rocks, and there are plenty of easy science experiments you can do with them. In this one, pour vinegar over a rock to see if it bubbles. If it does, you’ve found limestone!

Learn more: Limestone Experiments

Plastic bottle converted to a homemade rain gauge

51. Turn a bottle into a rain gauge

All you need is a plastic bottle, a ruler, and a permanent marker to make your own rain gauge. Monitor your measurements and see how they stack up against meteorology reports in your area.

Learn more: DIY Rain Gauge

Pile of different colored towels pushed together to create folds like mountains

52. Build up towel mountains

This clever demonstration helps kids understand how some landforms are created. Use layers of towels to represent rock layers and boxes for continents. Then pu-u-u-sh and see what happens!

Learn more: Towel Mountains

Layers of differently colored playdough with straw holes punched throughout all the layers

53. Take a play dough core sample

Learn about the layers of the earth by building them out of Play-Doh, then take a core sample with a straw. ( Love Play-Doh? Get more learning ideas here. )

Learn more: Play Dough Core Sampling

Science student poking holes in the bottom of a paper cup in the shape of a constellation

54. Project the stars on your ceiling

Use the video lesson in the link below to learn why stars are only visible at night. Then create a DIY star projector to explore the concept hands-on.

Learn more: DIY Star Projector

Glass jar of water with shaving cream floating on top, with blue food coloring dripping through, next to a can of shaving cream

55. Make it rain

Use shaving cream and food coloring to simulate clouds and rain. This is an easy science experiment little ones will beg to do over and over.

Learn more: Shaving Cream Rain

56. Blow up your fingerprint

This is such a cool (and easy!) way to look at fingerprint patterns. Inflate a balloon a bit, use some ink to put a fingerprint on it, then blow it up big to see your fingerprint in detail.

Edible DNA model made with Twizzlers, gumdrops, and toothpicks

57. Snack on a DNA model

Twizzlers, gumdrops, and a few toothpicks are all you need to make this super-fun (and yummy!) DNA model.

Learn more: Edible DNA Model

58. Dissect a flower

Take a nature walk and find a flower or two. Then bring them home and take them apart to discover all the different parts of flowers.

DIY smartphone amplifier made from paper cups

59. Craft smartphone speakers

No Bluetooth speaker? No problem! Put together your own from paper cups and toilet paper tubes.

Learn more: Smartphone Speakers

Car made from cardboard with bottlecap wheels and powered by a blue balloon

60. Race a balloon-powered car

Kids will be amazed when they learn they can put together this awesome racer using cardboard and bottle-cap wheels. The balloon-powered “engine” is so much fun too.

Learn more: Balloon-Powered Car

Miniature Ferris Wheel built out of colorful wood craft sticks

61. Build a Ferris wheel

You’ve probably ridden on a Ferris wheel, but can you build one? Stock up on wood craft sticks and find out! Play around with different designs to see which one works best.

Learn more: Craft Stick Ferris Wheel

62. Design a phone stand

There are lots of ways to craft a DIY phone stand, which makes this a perfect creative-thinking STEM challenge.

63. Conduct an egg drop

Put all their engineering skills to the test with an egg drop! Challenge kids to build a container from stuff they find around the house that will protect an egg from a long fall (this is especially fun to do from upper-story windows).

Learn more: Egg Drop Challenge Ideas

Student building a roller coaster of drinking straws for a ping pong ball (Fourth Grade Science)

64. Engineer a drinking-straw roller coaster

STEM challenges are always a hit with kids. We love this one, which only requires basic supplies like drinking straws.

Learn more: Straw Roller Coaster

Outside Science Solar Oven Desert Chica

65. Build a solar oven

Explore the power of the sun when you build your own solar ovens and use them to cook some yummy treats. This experiment takes a little more time and effort, but the results are always impressive. The link below has complete instructions.

Learn more: Solar Oven

Mini Da Vinci bridge made of pencils and rubber bands

66. Build a Da Vinci bridge

There are plenty of bridge-building experiments out there, but this one is unique. It’s inspired by Leonardo da Vinci’s 500-year-old self-supporting wooden bridge. Learn how to build it at the link, and expand your learning by exploring more about Da Vinci himself.

Learn more: Da Vinci Bridge

67. Step through an index card

This is one easy science experiment that never fails to astonish. With carefully placed scissor cuts on an index card, you can make a loop large enough to fit a (small) human body through! Kids will be wowed as they learn about surface area.

Student standing on top of a structure built from cardboard sheets and paper cups

68. Stand on a pile of paper cups

Combine physics and engineering and challenge kids to create a paper cup structure that can support their weight. This is a cool project for aspiring architects.

Learn more: Paper Cup Stack

Child standing on a stepladder dropping a toy attached to a paper parachute

69. Test out parachutes

Gather a variety of materials (try tissues, handkerchiefs, plastic bags, etc.) and see which ones make the best parachutes. You can also find out how they’re affected by windy days or find out which ones work in the rain.

Learn more: Parachute Drop

Students balancing a textbook on top of a pyramid of rolled up newspaper

70. Recycle newspapers into an engineering challenge

It’s amazing how a stack of newspapers can spark such creative engineering. Challenge kids to build a tower, support a book, or even build a chair using only newspaper and tape!

Learn more: Newspaper STEM Challenge

Plastic cup with rubber bands stretched across the opening

71. Use rubber bands to sound out acoustics

Explore the ways that sound waves are affected by what’s around them using a simple rubber band “guitar.” (Kids absolutely love playing with these!)

Learn more: Rubber Band Guitar

Science student pouring water over a cupcake wrapper propped on wood craft sticks

72. Assemble a better umbrella

Challenge students to engineer the best possible umbrella from various household supplies. Encourage them to plan, draw blueprints, and test their creations using the scientific method.

Learn more: Umbrella STEM Challenge

Plus, sign up for our newsletters to get all the latest learning ideas straight to your inbox.

Science doesn't have to be complicated! Try these easy science experiments using items you already have around the house or classroom.

You Might Also Like

Collage of Volcano Science Experiments

16 Red-Hot Volcano Science Experiments and Kits For Classrooms or Science Fairs

Kids will erupt with excitement! Continue Reading

Copyright © 2024. All rights reserved. 5335 Gate Parkway, Jacksonville, FL 32256

Counter

Inside a new experiment to find the climate-proof coffee of the future

An international public-private partnership is supercharging coffee breeding to save your morning brew..

hands hold dried coffee beans

David Ngibuini is a second-generation coffee farmer in Kenya’s central highlands, an area of cool temperatures and rich volcanic soil that’s long been one of the best places to grow coffee on Earth. On an afternoon in May, after a couple of months of rain, his 11-acre plot is lush. Six thousand trees — nearly all of them varieties of Coffea arabica, the most widely consumed and best-tasting coffee species — sit in neatly planted rows, their waxy, deep green leaves shimmering in the sun. Workers sort a pile of freshly-picked cherries — the red fruit that contains the beans that will be fermented, dried, and shipped to roasters around the world.

The vigor of this year’s harvest masks a deeper, existential struggle. Arabica coffee, which has been farmed in Kenya since the 19 th century, is especially vulnerable to climate change. One 2022 study , from the Zurich University of Applied Sciences, projects the amount of land most suitable to growing it will fall more than 50 percent by 2050. 

Ngibuini’s farm, Maguta Estate, is already feeling the impact. Rising temperatures have inhibited the growth of cherries and made trees more vulnerable to diseases and pests. Rains, which used to come reliably twice a year, are increasingly erratic, which leads to wide swings in volume and quality. In his best year, spanning 2020 and 2021, Ngibuini processed nearly 50,000 pounds of beans, sourced from his farm as well as others in the area. The next year, following a prolonged drought, output was down almost 80 percent. 

“We didn’t even have a major pest attack,” he said. “The drop was just because of the climate.”

Grist thanks its sponsors. Become one .

To support our nonprofit environmental journalism, please consider disabling your ad-blocker to allow ads on Grist. Here's How

A man wearing a black baseball cap and black t-shirt looks directly at the camera as he stands among lush green shrubs, his hand cradling some of their red berries

As coffee’s precarity is rising, so is demand: According to some estimates, global consumption, currently 2.3 billion cups per day , could double by mid-century . The projected supply gap has left the industry scrambling for possible fixes, including non-arabica coffee species and caffeine-infused alternatives made from substances like chickpeas and date seeds.

For coffee purists, though, and millions of farming families like Ngibuini’s, the most promising solution might be a newfound push to improve adaptability, and yields, of arabica itself. That’s the idea behind Innovea, a new project led by the nonprofit World Coffee Research, that seeks to supercharge the breeding of improved arabica varieties — unique variations of a given species that have been selected for certain characteristics. In an industry that has long neglected to fund research and development, Innovea, a collaboration with government-affiliated research institutions in nine partner countries, including Kenya, is widely considered to be the most sweeping coffee breeding initiative in decades.

According to Vern Long, CEO of World Coffee Research, or WCR, which is based in the United States and funded by the coffee industry, new varieties are one of the best ways to “improve a crop’s productivity and reduce risk.” Innovea’s goal, she said, is to develop trees that are optimized for a range of production environments — and ultimately give farmers more climate-resilient options.

Although nearly every commodity faces threats from a warming climate, arabica is especially picky. Its trees perform best in areas with moderate rainfall and temperatures that stay between 59 and 82 degrees Fahrenheit. This typically means regions of the tropics at least 3,000 feet above sea level; Ngibuini’s farm near Mount Kenya, Africa’s second-highest peak, sits at a cool 5,700. As temperatures warm, many expect cultivation to shift to even higher altitudes. This, however, has its limits. “The higher up you go, the less land there is available,” said Roman Grüter, an environmental scientist who led the Zurich University of Applied Sciences study. Farmers shifting upwards, he added, are more likely to encounter slopes that are too steep, or protected conservation areas.

Arabica is so fragile in part because its gene pool is surprisingly narrow. The 58 varieties that are widely grown today are all derived from a subset of wild forest coffee native to Ethiopia, which was brought by Arab traders to Yemen in the 15 th century and later spread by European colonizers across Asia, Africa, and Latin America. Because it is a slow-maturing tree crop, new variety development, which involves breeding over several generations, can take decades. Coffee R&D, like much crop innovation, is largely state financed — and in the low- and middle-income countries where arabica is grown, governments are often strapped for cash. While Brazil and Colombia, the two largest arabica producers, have a history of strong government support for coffee research, many of their counterparts have long lacked sufficient resources for variety development. A study commissioned by WCR in 2023 estimates that just $115 million is invested in coffee R&D each year, less than one-tenth of one percent of coffee’s $200 billion retail value.

Cup of coffee with ceramic pourover surrounded by chickpeas, date and date seeds, and chicory root

“If you’re a low-income country, and you need to pay for roads and clinics and teacher’s salaries, there’s a strong pull to put revenue from coffee into those things instead of research,” Long said. 

For much of coffee’s history, the importers, roasters, and retailers of the rich world haven’t put much money into crop improvement either: As long as they had a reliable supply of beans, they didn’t have to. A wakeup call came in 2012, when shifts in temperature and rainfall linked to climate change triggered an outbreak of coffee leaf rust, a debilitating fungus, that would affect Latin America for years. A group of coffee businesses established WCR that year as a way to facilitate collaborative R&D; the organization today is funded by 177 member companies. 

WCR began by conducting a trial of existing varieties, planting 31 of them from around the world in a range of climate zones in 15 countries. It also established a project to develop and trial new “F1 hybrids,” varieties created from genetically distant parents that tend to be higher yielding but are also more expensive to cultivate.

Three people, one in a dark shirt and the other two in white lab coats, organize seeds in large plastic tubs in a laboratory

Innovea, which launched in 2022, builds upon both efforts. To start, WCR breeders created 30 novel crosses from 16 parent varieties chosen based on their performance in prior trials. WCR then shipped 5,000 resulting seeds — each of them genetically distinct — to government researchers in Kenya, Rwanda, Uganda, India, Indonesia, Costa Rica, Mexico, Peru, and Hawai‘i. Planting on experimental sites began this year and will continue into 2025.

After six years, when the new trees have matured and produced several harvests of their own, many will have traits that are undesirable, Long said. Some, though, will be “high yielding, disease resistant, and taste good,” and will be moved to further trials or used to make new crosses that could result in even better trait combinations. While the breeding is done using traditional methods, it’s being aided by low-cost genetic sequencing technology, which allows WCR and partner breeders to correlate observed traits with plant DNA and make new crosses faster.

“The idea is to identify the genes we’re looking for and move on with those plants instead of others,” said Jane Cheserek, lead breeder at Kenya’s government-run Coffee Research Institute, WCR’s Kenyan partner. 

Innovea is not the only private sector-funded coffee breeding effort: At least two big industry players, Nestlé and Starbucks , have variety-development programs in-house. 

What makes Innovea stand out is its scale and its collaborative approach. Although coffee-exporting countries are natural competitors, Long said, partner governments have accepted that it’s in their best interest to cooperate on R&D and allow their genetic material to move across borders. WCR expects to make 100 new pre-commercial varieties available for trials by 2030 and will then work with partner governments to release a subset of those to farmers as soon as 2036. Ultimately, these “finished varieties” will be owned by governments, rather than by WCR or its financial backers. 

The effort “amps collaboration up to a new level,” said Stuart McCook, a historian at the University of Guelph in Ontario who studies coffee and other tropical commodities and who is not involved in Innovea. The program, he added, represents the first coffee breeding project of such a global scope since a Portugal-led effort to develop and circulate leaf rust-resistant coffees in the 1960s. 

A close-up of a hand brushing a paintbrush against small white protuberances sprouting out of a lush green branch

While McCook believes that new variety development is vital to the quest to make coffee more resilient, he and many other experts argue it’s not a panacea. As coffee growing regions warm, he said, innovations in breeding will need to be combined with adaptations in farming practices, like the introduction of “shade trees” — other types of trees to block the sun — and efforts to regenerate depleted soils. Coffee growers around the world, especially at the 12.5 million smallholder farms that produce 60 percent of the world’s supply, will continue to face a global market defined by wild swings in price that at times mean selling harvests for below the cost of production — which in turn makes investing in these adaptations even harder. One 2018 study by the Kenya Coffee Platform, an industry association, estimated that only 49 percent of Kenya’s coffee smallholders earned a “living wage” from the crop. Kenya’s coffee output today is less than half that of its peak in the 1980s, in part because younger generations are turning to more profitable crops, like macadamia nuts or avocados, or selling land to developers. On the outskirts of Nairobi, Kenya’s capital, many areas that once brimmed with arabica have been paved over for housing estates or shopping malls.  

Ngibuini, 32, is somewhat insulated from the market’s excesses: he sells most of his beans, which have won awards for quality, to a specialty buyer at a premium. In recent years he’s planted shade trees, which have also boosted soil nutrients and led to improved cherry quality. 

What he cannot do, at least for now, is plant the perfect variety of coffee. While he has several on his farm, all of them come with tradeoffs: One Kenya-developed F1 hybrid, for example, which he chose for its disease resistance, struggled more than other varieties in the recent drought. Ideally, he’d plant a variety that could resist the coffee berry borer, a beetle that feasts on coffee cherries, and that would ripen with greater uniformity. The erratic rains, he said, mean cherries are ripening less consistently than ever, which makes harvesting and processing less efficient.    

This variety, today, remains hypothetical. Yet in the years ahead, if Innovea lives up to its promise, Ngibuini will have more control over the types of coffee trees he cultivates — so he can better play his part in saving the morning brew for all of us.

The people who feed America are going hungry

State of Emergency

State of Emergency

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 24 July 2024

AI models collapse when trained on recursively generated data

  • Ilia Shumailov 1   na1 ,
  • Zakhar Shumaylov 2   na1 ,
  • Yiren Zhao   ORCID: orcid.org/0000-0002-3727-7463 3 ,
  • Nicolas Papernot 4 , 5 ,
  • Ross Anderson   ORCID: orcid.org/0000-0001-8697-5682 6 , 7   na2 &
  • Yarin Gal   ORCID: orcid.org/0000-0002-2733-2078 1  

Nature volume  631 ,  pages 755–759 ( 2024 ) Cite this article

179k Accesses

1 Citations

1995 Altmetric

Metrics details

  • Computational science
  • Computer science

Stable diffusion revolutionized image creation from descriptive text. GPT-2 (ref.  1 ), GPT-3(.5) (ref.  2 ) and GPT-4 (ref.  3 ) demonstrated high performance across a variety of language tasks. ChatGPT introduced such language models to the public. It is now clear that generative artificial intelligence (AI) such as large language models (LLMs) is here to stay and will substantially change the ecosystem of online text and images. Here we consider what may happen to GPT-{ n } once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs). We build theoretical intuition behind the phenomenon and portray its ubiquity among all learned generative models. We demonstrate that it must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.

Similar content being viewed by others

public experiments

Bias of AI-generated content: an examination of news produced by large language models

public experiments

Augmenting interpretable models with large language models during training

public experiments

The neural coding framework for learning generative models

The development of LLMs is very involved and requires large quantities of training data. Yet, although current LLMs 2 , 4 , 5 , 6 , including GPT-3, were trained on predominantly human-generated text, this may change. If the training data of most future models are also scraped from the web, then they will inevitably train on data produced by their predecessors. In this paper, we investigate what happens when text produced by, for example, a version of GPT forms most of the training dataset of following models. What happens to GPT generations GPT-{ n } as n increases? We discover that indiscriminately learning from data produced by other models causes ‘model collapse’—a degenerative process whereby, over time, models forget the true underlying data distribution, even in the absence of a shift in the distribution over time. We give examples of model collapse for GMMs, VAEs and LLMs. We show that, over time, models start losing information about the true distribution, which first starts with tails disappearing, and learned behaviours converge over the generations to a point estimate with very small variance. Furthermore, we show that this process is inevitable, even for cases with almost ideal conditions for long-term learning, that is, no function estimation error. We also briefly mention two close concepts to model collapse from the existing literature: catastrophic forgetting arising in the framework of task-free continual learning 7 and data poisoning 8 , 9 maliciously leading to unintended behaviour. Neither is able to explain the phenomenon of model collapse fully, as the setting is fundamentally different, but they provide another perspective on the observed phenomenon and are discussed in more depth in the  Supplementary Materials . Finally, we discuss the broader implications of model collapse. We note that access to the original data distribution is crucial: in learning tasks in which the tails of the underlying distribution matter, one needs access to real human-produced data. In other words, the use of LLMs at scale to publish content on the Internet will pollute the collection of data to train their successors: data about human interactions with LLMs will be increasingly valuable.

What is model collapse?

Definition 2.1 (model collapse).

Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation. Being trained on polluted data, they then mis-perceive reality. The process is depicted in Fig. 1a . We separate two special cases: early model collapse and late model collapse. In early model collapse, the model begins losing information about the tails of the distribution; in late model collapse, the model converges to a distribution that carries little resemblance to the original one, often with substantially reduced variance.

This process occurs owing to three specific sources of error compounding over generations and causing deviation from the original model:

Statistical approximation error. This is the primary type of error, which arises owing to the number of samples being finite, and disappears as the number of samples tends to infinity. This occurs because of a non-zero probability that information can get lost at every step of resampling.

Functional expressivity error. This is a secondary type of error, arising owing to limited function approximator expressiveness. In particular, neural networks are only universal approximators as their size goes to infinity. As a result, a neural network can introduce non-zero likelihood outside the support of the original distribution or zero likelihood inside the support of the original distribution. A simple example of the expressivity error is if we tried fitting a mixture of two Gaussians with a single Gaussian. Even if we have perfect information about the data distribution (that is, infinite number of samples), model errors will be inevitable. However, in the absence of the other two types of error, this can only occur at the first generation.

Functional approximation error. This is a secondary type of error, arising primarily from the limitations of learning procedures, for example, structural bias of stochastic gradient descent 10 , 11 or choice of objective 12 . This error can be viewed as one arising in the limit of infinite data and perfect expressivity at each generation.

Each of the above can cause model collapse to get worse or better. More approximation power can even be a double-edged sword—better expressiveness may counteract statistical noise, resulting in a good approximation of the true distribution, but it can equally compound the noise. More often than not, we get a cascading effect, in which individual inaccuracies combine to cause the overall error to grow. For example, overfitting the density model causes the model to extrapolate incorrectly and assigns high-density regions to low-density regions not covered in the training set support; these will then be sampled with arbitrary frequency. It is worth noting that other types of error exist. For example, computers have limited precision in practice. We now turn to mathematical intuition to explain how the above give rise to the errors observed, how different sources can compound and how we can quantify the average model divergence.

Theoretical intuition

Here we provide a theoretical intuition for the phenomenon of model collapse. We argue that the process of model collapse is universal among generative models that recursively train on data generated by previous generations. We quantify the sources of errors discussed in the previous section by examining two mathematical models, which prove to be simple enough to provide analytical expressions for quantities of interest, but also portray the phenomenon of model collapse: a discrete distribution in the absence of functional expressivity and approximation errors, and a multidimensional Gaussian approximation, portraying joint functional expressivity and statistical errors. We further illustrate the impact of all three jointly for a more complex setting of density estimation in Hilbert spaces in the Supplementary Materials .

The overall stochastic process we consider, which we call learning with generational data, is the following. The dataset at generation i is \({{\mathcal{D}}}_{i}\) , comprising independent and identically distributed random variables \({X}_{j}^{i}\) with distribution p i , j   ∈  {1,…,  M i } denotes the size of the dataset. Going from generation i to generation i  + 1, we aim to estimate the distribution of samples in \({{\mathcal{D}}}_{i}\) , with an approximation \({p}_{{\theta }_{i+1}}\) . This step is what we refer to as functional approximation, \({p}_{{\theta }_{i+1}}={{\mathcal{F}}}_{\theta }({p}_{i})\) . The dataset \({{\mathcal{D}}}_{i+1}\) is then generated by sampling from \({p}_{i+1}={\alpha }_{i}{p}_{{\theta }_{i+1}}+{\beta }_{i}{p}_{i}+{\gamma }_{i}{p}_{0}\) , with non-negative parameters α i ,  β i ,  γ i summing to 1, that is, they represent proportions of data used from different generations. This corresponds to a mixing of data coming from the original distribution ( γ i ), data used by the previous generation ( β i ) and data generated by the new model ( α i ). We refer to this as the sampling step. For the mathematical models to come, we consider α i  =  γ i  = 0, that is, data only from a single step are used, whereas numerical experiments are performed on more realistic choices of parameters.

Discrete distributions with exact approximation

In this subsection, we consider a discrete probability distribution in absence of functional approximation and expressivity errors, that is, \({\mathcal{F}}(p)=p\) . In this case, model collapse arises only because of statistical errors from the sampling step. At first, the tails (low-probability events) begin to disappear as a result of the low probability of sampling them and, over time, support of the distribution shrinks. Denoting the sample size as M , if we consider state i with probability \(q\le \frac{1}{M}\) , the expected number of samples with value i coming from those events will be less than 1. In practice, this would mean that we lose information about them. Considering more generally some state i with probability q , using standard conditional probability, we can show that the probability of losing information (that is, sampling no data at some generation) is equal to 1 −  q , implying that the distribution must converge to a delta function positioned at some state, with the probability of ending up at a certain state equal to the probability of sampling said state from the original distribution.

This can be shown directly by considering the process \({{\bf{X}}}^{i}\to {\mathcal{F}}\,\to \) \({p}_{i+1}\to {{\bf{X}}}^{i+1}\) as a Markov chain, as X i +1 only depends on X i . Furthermore, if all the \({X}_{j}^{i}\) have the same value, then at the next generation, the approximated distribution will be exactly a delta function and therefore all of \({X}_{j}^{i+1}\) will also have the same value. This implies that the Markov chain contains at least one absorbing state and therefore, with probability 1, it will converge to one of the absorbing states. This is a well-known fact, of which a proof is provided in the Supplementary Materials . For this chain, the only absorbing states are those corresponding to delta functions. As a result, as we follow the progress of model collapse, we are guaranteed to end up in a constant state, having lost all the information of the original distribution when the chain is absorbed. This argument also works in general owing to floating-point representations being discrete, making the Markov chain over the parameters of the model discrete. Thus, as long as the model parameterization allows for delta functions, we will get to it, because—owing to sampling errors—the only possible absorbing states are delta functions. On the basis of the discussion above, we see how both early model collapse, in which only the low-probability events get cut off, and late stage model collapse, in which the process begins to collapse into a single mode, must arise in the case of discrete distributions with perfect functional approximation.

Multidimensional Gaussian

Following the discussion about discrete distributions, we now present a more generic result, which can be shown in the Gaussian approximation setting, in which each generation is approximated using the unbiased estimates of the mean and the variance. A similar result holds more generally, which we detail in the  Supplementary Materials .

Theorem 3.1 (Gaussian model collapse)

Assume the original data are sampled from distribution \({{\mathcal{D}}}_{0}\) (not necessarily Gaussian), with non-zero sample variance. Assume X n are fit recursively using the unbiased sample mean and variance estimators from the previous generation, \({X}_{j}^{n}| {\mu }_{n},{\Sigma }_{n} \sim {\mathcal{N}}({\mu }_{n},{\Sigma }_{n})\) , with a fixed sample size. Then,

in which \({{\mathbb{W}}}_{2}\) denotes the Wasserstein-2 distance between the true distribution and its approximation at generation n .

In words, this implies that not only does the n th generation approximation diverge arbitrarily far from the original one but it also collapses to be zero variance as the number of generations increases, with probability 1. The results are very analogous to that seen in the discrete case, with this theorem illustrating the effect of late stage model collapse, in which the process begins to collapse to be zero variance. The early stage model collapse can also be seen and the interested reader is referred to the  Supplementary Materials for a more in-depth discussion.

Model collapse in language models

In this section, we evaluate the effect of model collapse on language models. We cover more interpretable machine learning models—VAEs and GMMs—in the  Supplementary Materials . Code is publically available in ref.  13 .

Model collapse is universal across various families of machine learning models. Yet, if small models such as GMMs and VAEs are normally trained from scratch, LLMs are different. They are so expensive to retrain from scratch that they are typically initialized with pre-trained models such as BERT 4 , RoBERTa 5 or GPT-2 (ref.  2 ), which are trained on large text corpora. They are then fine-tuned to various downstream tasks 14 .

Here we explore what happens with language models when they are sequentially fine-tuned with data generated by other models. We can easily replicate all experiments covered in this paper with larger language models in non-fine-tuning settings to demonstrate model collapse. Given that training a single moderately large model produces twice the American lifetime’s worth of CO 2 (ref.  15 ), we opted to not run such an experiment and instead focus on a more realistic setting for a proof of concept. Note that even the language experiments described in this paper took weeks to run. We evaluate the most common setting of training a language model—a fine-tuning setting for which each of the training cycles starts from a pre-trained model with recent data. The data here come from another fine-tuned pre-trained model. Because training is restricted to produce models that are close to the original pre-trained model, and data points generated by the models will generally produce very small gradients, the expectation here may be that the model should only change moderately after fine-tuning. We fine-tune the OPT-125m causal language model made available by Meta through Hugging Face 6 .

We fine-tune it on the wikitext2 dataset 16 . For data generation from the trained models, we use a five-way beam search. We block training sequences to be 64 tokens long; then, for each token sequence in the training set, we ask the model to predict the next 64 tokens. We go through all of the original training dataset and produce an artificial dataset of the same size. Because we go through all of the original dataset and predict all of the blocks, if the model had 0 error, it would produce the original wikitext2 dataset. Training for each generation starts with generation from the original training data. Each experiment is run five times and the results are shown as five separate runs with different randomness seeds. The original model fine-tuned with real wikitext2 data obtains 34 mean perplexity, from the zero-shot baseline of 115, that is, it successfully learns the task. Finally, to be as realistic as possible, we use the best-performing model on the original task, evaluated using the original wikitext2 validation set, as the base model for the subsequent generations, meaning that—in practice—observed model collapse can be even more pronounced. Here we consider two different settings:

Five epochs, no original training data. Here the model is trained for five epochs starting on the original dataset but with no original data retained for subsequent runs. The overall original task performance is presented in Fig. 1b . We find that training with generated data allows us to adapt to the underlying task, losing some performance, from 20 to 28 perplexity points.

Ten epochs, 10% of original training data preserved. Here the model is trained for ten epochs on the original dataset and with every new generation of training, a random 10% of the original data points is sampled. The overall original task performance is presented in Fig. 1c . We find that preservation of the original data allows for better model fine-tuning and leads to only minor degradation of performance.

Both training regimes lead to degraded performance in our models, yet we do find that learning with generated data is possible and models can successfully learn (some of) the underlying task. In particular, from Fig. 1 and their 3D versions in the  Supplementary Materials , we see that model collapse occurs, as the density of samples with low perplexity begins to accumulate over the generations. This in turn makes it likely that, over the generations, the sampled data will similarly collapse to a delta function.

figure 1

a , Model collapse refers to a degenerative learning process in which models start forgetting improbable events over time, as the model becomes poisoned with its own projection of reality. Here data are assumed to be human-curated and start off clean; then model 0 is trained and data are sampled from it; at step n , data are added to the overall data from step n  − 1 and this combination is used to train model n . Data obtained with Monte Carlo sampling should ideally be statistically close to the original, provided that fitting and sampling procedures are perfect. This process depicts what happens in real life with the Internet: model-generated data become pervasive. b , c , Performance of OPT-125m models of different generations evaluated using the original wikitext2 test dataset. Shown on the left are the histograms of perplexities of each individual data training sequence produced by different generations as evaluated by the very first model trained with the real data. Over the generations, models tend to produce samples that the original model trained with real data is more likely to produce. At the same time, a much longer tail appears for later generations. Later generations start producing samples that would never be produced by the original model, that is, they start misperceiving reality based on errors introduced by their ancestors. The same plots are shown in 3D in the Supplementary Materials . On the right, average perplexity and its standard deviation are shown for each independent run. The x axis refers to the generation of the model. ‘Real’ refers to the ‘model 0’ trained on the original wikitext2 dataset; model 1 was trained on the data produced by model 0, model 2 was trained on data produced by model 1 and so on, with all generated datasets equal in size. We find that models trained on generated data are able to learn some of the original task, but with errors, as seen from the increase in perplexity.

It is important to note here that the observed behaviour is in line with the general intuition established in the section ‘Theoretical intuition’. To be precise, in all experiments, generational learning is only performed on a finite (usually small) number of generations, whereas claims of the section ‘Theoretical intuition’ are mostly presented in the limit of generations going to infinity. However, as seen from experiments on VAEs and GMMs in the  Supplementary Materials , convergence to delta functions and specific rates of such convergence are highly related to the specifics of the problem considered, and complete collapse may or may not occur, even after a small number of steps. This is further illustrated theoretically in the  Supplementary Materials , in which potentially notable divergence from the original model can occur even after a few generations.

Figure 1b,c on the left shows histograms of individual data-point perplexities generated by the models of different generations as is evaluated by the first model developed with real wikitext2 training data. Here, over the generations, models tend to produce more sequences that the original model would produce with the higher likelihood. The observed effect is similar to that described for VAEs and GMMs in the  Supplementary Materials , in which—over the generations—models started to produce samples that would be produced with higher probabilities by the original model. At the same time, we discover that generated data have much longer tails, suggesting that some of the data would never be produced by the original model—these are the errors that accumulate because of the learning with generational data.

Example of text outputs of an OPT-125m model affected by model collapse—models degrade over generations, for which each new generation is trained on data produced by the previous generation.

Input: some started before 1360 — was typically accomplished by a master mason and a small team of itinerant masons, supplemented by local parish labourers, according to Poyntz Wright. But other authors reject this model, suggesting instead that leading architects designed the parish church towers based on early examples of Perpendicular.

Gen 0: Revival architecture such as St. John’s Cathedral in London. The earliest surviving example of Perpendicular Revival architecture is found in the 18th @-@ century Church of Our Lady of Guernsey, which dates from the late 19th century. There are two types of perpendicular churches : those.

Gen 1: architecture such as St. Peter’s Basilica in Rome or St. Peter’s Basilica in Buenos Aires. There is no evidence that any of these buildings were built during the reign of Pope Innocent III, but it is possible that they may have been built during the reign of his successor, Pope Innocent.

Gen 5: ism, which had been translated into more than 100 languages including English, French, German, Italian, Spanish, Portuguese, Dutch, Swedish, Norwegian, Polish, Hungarian, Slovak, Lithuanian, Estonian, Finnish, Romanian, Bulgarian, Turkish, Croatian, Serbian, Ukrainian, Russian, Kazakh, Kyrgyz.

Gen 9: architecture. In addition to being home to some of the world’s largest populations of black @-@ tailed jackrabbits, white @-@ tailed jackrabbits, blue @-@ tailed jackrabbits, red @-@ tailed jackrabbits, yellow @-.

Ablation: Repetitions

We find that data generated by language models in our experiments end up containing a large number of repeating phrases, as in Example 1. The repeating problem has been observed in nearly all text-generation models 17 , 18 and, to rule this out as the cause of model collapse, we further provide numerical experiments when models are explicitly encouraged to produce non-repeating sequences with a repeating penalty of 2.0. We find that this causes the models to produce lower score continuations to avoid using repeats, which—as a result—causes the consequent models to perform even worse. Model perplexities shift across the generations towards more probable token sequences, as measured using the model trained on the original real data distribution. Further illustrations are provided in the  Supplementary Materials . In particular, enforcing this for the LLM experiments causes the perplexity to double compared with the original. Models remain as susceptible to model collapse, if not more.

The described process demonstrates that fine-tuning of language models does not curb the effects of model collapse and models that are being fine-tuned are also vulnerable. We find that, over the generations, models tend to produce more probable sequences from the original data and start introducing their own improbable sequences, that is, errors.

We now discuss the implications of model collapse on the underlying learning dynamics of LLMs. Long-term poisoning attacks on language models are not new. For example, we saw the creation of click, content and troll farms, a form of human ‘language models’, whose job is to misguide social networks and search algorithms. The negative effect that these poisoning attacks had on search results led to changes in search algorithms. For example, Google downgraded farmed articles 19 , putting more emphasis on content produced by trustworthy sources, such as education domains, whereas DuckDuckGo removed them altogether 20 . What is different with the arrival of LLMs is the scale at which such poisoning can happen once it is automated. Preserving the ability of LLMs to model low-probability events is essential to the fairness of their predictions: such events are often relevant to marginalized groups. Low-probability events are also vital to understand complex systems 21 .

Our evaluation suggests a ‘first mover advantage’ when it comes to training models such as LLMs. In our work, we demonstrate that training on samples from another generative model can induce a distribution shift, which—over time—causes model collapse. This in turn causes the model to mis-perceive the underlying learning task. To sustain learning over a long period of time, we need to make sure that access to the original data source is preserved and that further data not generated by LLMs remain available over time. The need to distinguish data generated by LLMs from other data raises questions about the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale. One option is community-wide coordination to ensure that different parties involved in LLM creation and deployment share the information needed to resolve questions of provenance. Otherwise, it may become increasingly difficult to train newer versions of LLMs without access to data that were crawled from the Internet before the mass adoption of the technology or direct access to data generated by humans at scale.

Data availability

Data generation code for GMM experiments is available in ref.  13 . Data used for VAE experiments are available in ref.  22 . Data used for LLM experiments are available in ref.  16 .

Code availability

Code for all experiments is publically available in ref.  13 .

Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1 , 9 (2019).

Google Scholar  

Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 , 1877–1901 (2020).

OpenAI. GPT-4 Technical Report. https://cdn.openai.com/papers/gpt-4.pdf (2023).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186 (Association for Computational Linguistics, 2019).

Liu, Y. et al. RoBERTa: a Robustly Optimized BERT Pretraining Approach. Preprint at https://arxiv.org/abs/1907.11692 (2019).

Zhang, S. et al. Opt: open pre-trained transformer language models. Preprint at https://arxiv.org/abs/2205.01068 (2022).

Aljundi, R., Kelchtermans, K. & Tuytelaars, T. Task-free continual learning. in: Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11254–11263 (IEEE, 2019).

Carlini, N. & Terzis, A. in Proc. Tenth International Conference on Learning Representations (ICLR, 2022).

Carlini, N. et al. in Proc. 2024 IEEE Symposium on Security and Privacy (SP) 179 (IEEE, 2024).

Mousavi-Hosseini, A., Park, S., Girotti, M., Mitliagkas, I. & Erdogdu, M. A. in Proc. Eleventh International Conference on Learning Representations (ICLR, 2023).

Soudry, D., Hoffer, E., Nacson, M. S., Gunasekar, S. & Srebro, N. The implicit bias of gradient descent on separable data. J. Mach. Learn. Res. 19 , 1–57 (2018).

MathSciNet   Google Scholar  

Gu, Y., Dong, L., Wei, F. & Huang, M. in Proc. Twelfth International Conference on Learning Representations (ICLR, 2024).

Shumailov, I. & Shumaylov, Z. Public code for Model Collapse (0.1). Zenodo https://doi.org/10.5281/zenodo.10866595 (2024).

Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2022).

Strubell, E., Ganesh, A. & McCallum, A. in Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A., Traum, D. & Màrquez, L.) 3645–3650 (Association for Computational Linguistics, 2019).

Merity, S., Xiong, C., Bradbury, J. & Socher, R. in Proc. 5th International Conference on Learning Representations (ICLR, 2017).

Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C. & Socher, R. CTRL: a conditional transformer language model for controllable generation. Preprint at https://arxiv.org/abs/1909.05858 (2019).

Shumailov, I. et al. in Proc. 2021 IEEE European Symposium on Security and Privacy (EuroS&P) 212–231 (IEEE, 2021).

Google. Finding more high-quality sites in search. Google https://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html (2011).

Mims, C. The search engine backlash against ‘content mills’. MIT Technology Review https://www.technologyreview.com/2010/07/26/26327/the-search-engine-backlash-against-content-mills/ (2010).

Taleb, N. N. Black swans and the domains of statistics. Am. Stat. 61 , 198–200 (2007).

Article   MathSciNet   Google Scholar  

LeCun, Y., Cortes, C. & Burges, C. J. C. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998).

Download references

Acknowledgements

This paper is dedicated to the memory of Professor Ross J. Anderson, our colleague and friend, who contributed much to this and other works we have produced over the years. We thank A. Thudi, D. Glukhov, P. Zaika, and D. Barak for useful discussions and feedback.

Author information

These authors contributed equally: Ilia Shumailov, Zakhar Shumaylov

Deceased: Ross Anderson

Authors and Affiliations

OATML, Department of Computer Science, University of Oxford, Oxford, UK

Ilia Shumailov & Yarin Gal

Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK

Zakhar Shumaylov

Department of Electrical and Electronic Engineering, Imperial College London, London, UK

University of Toronto, Toronto, Ontario, Canada

Nicolas Papernot

Vector Institute, Toronto, Ontario, Canada

Department of Computer Science and Technology, University of Cambridge, Cambridge, UK

Ross Anderson

School of Informatics, University of Edinburgh, Edinburgh, UK

You can also search for this author in PubMed   Google Scholar

Contributions

I.S. and Z.S. proposed and developed the idea, led the research and mathematical modelling and developed the GMM and VAE experiments. I.S. and Y.Z. developed the language-model experiments. N.P., Y.G. and R.A. supervised and guided the project. All authors contributed to writing of the manuscript. Y.G. is supported by a Turing AI Fellowship financed by the UK government’s Office for Artificial Intelligence, through UK Research and Innovation (grant reference EP/V030302/1) and delivered by the Alan Turing Institute.

Corresponding authors

Correspondence to Ilia Shumailov , Zakhar Shumaylov or Yarin Gal .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, supplementary data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shumailov, I., Shumaylov, Z., Zhao, Y. et al. AI models collapse when trained on recursively generated data. Nature 631 , 755–759 (2024). https://doi.org/10.1038/s41586-024-07566-y

Download citation

Received : 20 October 2023

Accepted : 14 May 2024

Published : 24 July 2024

Issue Date : 25 July 2024

DOI : https://doi.org/10.1038/s41586-024-07566-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Ai models fed ai-generated data quickly spew nonsense.

  • Elizabeth Gibney

Nature (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

public experiments

  • Technical advance
  • Open access
  • Published: 11 February 2021

Conceptualising natural and quasi experiments in public health

  • Frank de Vocht   ORCID: orcid.org/0000-0003-3631-627X 1 , 2 , 3 ,
  • Srinivasa Vittal Katikireddi 4 ,
  • Cheryl McQuire 1 , 2 ,
  • Kate Tilling 1 , 5 ,
  • Matthew Hickman 1 &
  • Peter Craig 4  

BMC Medical Research Methodology volume  21 , Article number:  32 ( 2021 ) Cite this article

21k Accesses

62 Citations

29 Altmetric

Metrics details

Natural or quasi experiments are appealing for public health research because they enable the evaluation of events or interventions that are difficult or impossible to manipulate experimentally, such as many policy and health system reforms. However, there remains ambiguity in the literature about their definition and how they differ from randomized controlled experiments and from other observational designs. We conceptualise natural experiments in the context of public health evaluations and align the study design to the Target Trial Framework.

A literature search was conducted, and key methodological papers were used to develop this work. Peer-reviewed papers were supplemented by grey literature.

Natural experiment studies (NES) combine features of experiments and non-experiments. They differ from planned experiments, such as randomized controlled trials, in that exposure allocation is not controlled by researchers. They differ from other observational designs in that they evaluate the impact of events or process that leads to differences in exposure. As a result they are, in theory, less susceptible to bias than other observational study designs. Importantly, causal inference relies heavily on the assumption that exposure allocation can be considered ‘as-if randomized’. The target trial framework provides a systematic basis for evaluating this assumption and the other design elements that underpin the causal claims that can be made from NES.

Conclusions

NES should be considered a type of study design rather than a set of tools for analyses of non-randomized interventions. Alignment of NES to the Target Trial framework will clarify the strength of evidence underpinning claims about the effectiveness of public health interventions.

Peer Review reports

When designing a study to estimate the causal effect of an intervention, the experiment (particularly the randomised controlled trial (RCT) is generally considered to be the least susceptible to bias. A defining feature of the experiment is that the researcher controls the assignment of the treatment or exposure. If properly conducted, random assignment balances unmeasured confounders in expectation between the intervention and control groups . In many evaluations of public health interventions, however, it is not possible to conduct randomised experiments. Instead, standard observational epidemiological study designs have traditionally been used. These are known to be susceptible to unmeasured confounding.

Natural experimental studies (NES) have become popular as an alternative evaluation design in public health research, as they have distinct benefits over traditional designs [ 1 ]. In NES, although the allocation and dosage of treatment or exposure are not under the control of the researcher, they are expected to be unrelated to other factors that cause the outcome of interest [ 2 , 3 , 4 , 5 ]. Such studies can provide strong causal information in complex real-world situations, and can generate effect sizes close to the causal estimates from RCTs [ 6 , 7 , 8 ]. The term natural experiment study is sometimes used synonymously with quasi-experiment; a much broader term that can also refer to researcher-led but non-randomised experiments. In this paper we argue for a clearer conceptualisation of natural experiment studies in public health research, and present a framework to improve their design and reporting and facilitate assessment of causal claims.

Natural and quasi-experiments have a long history of use for evaluations of public health interventions. One of the earliest and best-known examples is the case of ‘Dr John Snow and the Broad Street pump’ [ 9 ]. In this study, cholera deaths were significantly lower among residents served by the Lambeth water company, which had moved its intake pipe to an upstream location of the Thames following an earlier outbreak, compared to those served by the Southwark and Vauxhall water company, who did not move their intake pipe. Since houses in the study area were serviced by either company in an essentially random manner, this natural experiment provided strong evidence that cholera was transmitted through water [ 10 ].

Natural and quasi experiments

Natural and quasi experiments are appealing because they enable the evaluation of changes to a system that are difficult or impossible to manipulate experimentally. These include, for example, large events, pandemics and policy changes [ 7 , 11 ]. They also allow for retrospective evaluation when the opportunity for a trial has passed [ 12 ]. They offer benefits over standard observational studies because they exploit variation in exposure that arises from an exogenous ( i.e. not caused by other factors in the analytic model [ 1 ]) event or intervention. This aligns them to the ‘ do -operator’ in the work of Pearl [ 13 ]. Quasi experiments (QES) and NES thus combine features of experiments (exogenous exposure) and non-experiments (observations without a researcher-controlled intervention). As a result, they are generally less susceptible to confounding than many other observational study designs [ 14 ]. However, a common critique of QES and NES is that because the processes producing variation in exposure are outside the control of the research team, there is uncertainty as to whether confounding has been sufficiently minimized or avoided [ 7 ]. For example, a QES of the impact of a voluntary change by a fast food chain to label its menus with information on calories on subsequent purchasing of calories [ 15 ]. Unmeasured differences in the populations that visit that particular chain compared to other fast-food choices could lead to residual confounding.

A distinction is sometimes made between QES and NES. The term ‘natural experiment’ has traditionally referred to the occurrence of an event with a natural cause; a ‘force of nature‘(Fig.  1 a) [ 1 ]. These make for some of the most compelling studies of causation from non-randomised experiments. For example, the Canterbury earthquakes in 2010–2011 have been used to study the causal impact of such disasters because about half of an established birth cohort lived in the affected area with the remainder of the cohort living elsewhere [ 16 ]. More recently, the use of the term ‘natural’ has been understood more broadly as an event which did not involve the deliberate manipulation of exposure for research purposes (for example a policy change), even if human agency was involved [ 17 ]. Compared to natural experiments in QES the research team may be able to influence exposure allocation, even if the event or exposure itself is not under their full control; for example in a phased roll out of a policy [ 18 ]. A well-known example of a natural experiment is the “Dutch Hunger Winter” summarised by Lumey et al. [ 19 ]. During this period in the Second World War the German authorities blocked all food supplies to the occupied West of the Netherlands, which resulted in widespread starvation. Food supplies were restored immediately after the country was liberated, so the exposure was sharply defined by time as well as place. Because there was sufficient food in the occupied and liberated areas of the Netherlands before and after the Hunger Winter, exposure to famine occurred based on an individual’s time and place (of birth) only. Similar examples of such ‘political’ natural experiment studies are the study of the impact of China’s Great Famine [ 20 ] and the ‘special period’ in Cuba’s history following the collapse of the Soviet Union and the imposition of a US blockade [ 21 ]. NES that describe the evaluation of an event which did not involve the deliberate manipulation of an exposure but involved human agency, such as the impact of a new policy, are the mainstay of ‘natural experimental research’ in public health, and the term NES has become increasingly popular to indicate any quasi-experimental design (although it has not completely replaced it).

figure 1

Different conceptualisations of natural and quasi experiments within wider evaluation frameworks

Dunning takes the distinction of a NES further. He defines a NES as a QES where knowledge about the exposure allocation process provides a strong argument that allocation, although not deliberately manipulated by the researcher, is essentially random. This concept is referred to as ‘as-if randomization’ (Fig. 1 b) [ 4 , 8 , 10 ]. Under this definition, NES differ from QES in which the allocation of exposure, whether partly controlled by the researcher or not, does not clearly resemble a random process.

A third distinction between QES and NES has been made that argues that NES describe the study of unplanned events whereas QES describe evaluations of events that are planned (but not controlled by the researcher), such as policies or programmes specifically aimed at influencing an outcome (Fig. 1 c) [ 17 ]. In practice however, the distinction between these can be ambiguous.

When the assignment of exposure is not controlled by the researcher, with rare exceptions (for example lottery-system [ 22 ] or military draft [ 23 ] allocations), it is typically very difficult to prove that true (as-if) randomization occurred. Because of the ambiguity of ‘as-if randomization’ and the fact that the tools to assess this are the same as those used for assessment of internal validity in any observational study [ 12 ], the UK Medical Research Council (MRC) guidance advocates a broader conceptualisation of a NES. Under the MRC guidance, a NES is defined as any study that investigates an event that is not under the control of the research team, and which divides a population into exposed and unexposed groups, or into groups with different levels of exposure (Fig. 1 d).

Here, while acknowledging the remaining ambiguity regarding the precise definition of a NES, in consideration of the definitions above [ 24 ], we argue that:

what distinguishes NES from RCTs is that allocation is not controlled by the researchers and;

what distinguishes NES from other observational designs is that they specifically evaluate the impact of a clearly defined event or process which result in differences in exposure between groups.

A detailed assessment of the allocation mechanism (which determines exposure status) is essential. If we can demonstrate that the allocation process approximates a randomization process, any causal claims from NES will be substantially strengthened. The plausibility of the ‘as-if random’ assumption strongly depends on detailed knowledge of why and how individuals or groups of individuals were assigned to conditions and how the assignment process was implemented [ 10 ]. This plausibility can be assessed quantitatively for observed factors using standard tools for assessment of internal validity of a study [ 12 ], and should ideally be supplemented by a qualitative description of the assignment process. Common with contemporary public health practice, we will use the term ‘natural experiment study’, or NES to refer to both NES and QES, from hereon.

Medline, Embase and Google Scholar were searched using search terms including quasi-experiment, natural experiment, policy evaluation and public health evaluation and key methodological papers were used to develop this work. Peer-reviewed papers were supplemented by grey literature.

Part 1. Conceptualisations of natural experiments

An analytic approach.

Some conceptualisations of NES place their emphasis on the analytic tools that are used to evaluate natural experiments [ 25 , 26 ]. In this conceptualisation NES are understood as being defined by the way in which they are analysed, rather than by their design. An array of different statistical methods is available to analyse natural experiments, including regression adjustments, propensity scores, difference-in-differences, interrupted time series, regression discontinuity, synthetic controls, and instrumental variables. Overviews including strengths and limitations of the different methods are provided in [ 12 , 27 ]. However, an important drawback of this conceptualisation is that it suggests that there is a distinct set of methods for the analysis of NES.

A study design

The popularity of NES has resulted in some conceptual stretching, where the label is applied to a research design that only implausibly meets the definitional features of a NES [ 10 ]. For example, observational studies exploring variation in exposures (rather than the study of an event or change in exposure) have sometimes also been badged as NES. A more stringent classification of NES as a type of study design, rather than a collection of analytic tools, is important because it prevents attempts to incorrectly cover observational studies with a ‘glow of experimental legitimacy’ [ 10 ]. If the design rather than the statistical methodology defines a NES, this allows an open-ended array of statistical tools. These tools are not necessarily constrained by those mentioned above, but could also, for example, include new methods such as synthetic controls that can be utilised to analyse the natural experiments. The choice of appropriate evaluation method should be based on what is most suitable for each particular study, and then depends on the knowledge about the event, the availability of data, and design elements such as its allocation process.

Dunning argues that it is the overall research design, rather than just the statistical methods, that compels conviction when making causal claims. He proposes an evaluation framework for NES along the three dimensions of (1) the plausibility of as-if randomization of treatment, (2) the credibility of causal and statistical models, and (3) the substantive relevance of the treatment. Here, the first dimension is considered key for distinguishing NES from other QES [ 4 ]. NES can be divided into those where a plausible case for ‘as-if random’ assignment can be made (which he defines as NES), and those where confounding from observed factors is directly adjusted for through statistical means. The validity of the latter (which Dunning defines as ‘other quasi experiments’, and we define as ‘weaker NES’) relies on the assumption that unmeasured confounding is absent [ 8 ], and is considered less credible in theory for making causal claims [ 4 ]. In this framework, the ‘as-if-randomised’ NES can be viewed as offering stronger causal evidence than other quasi-experiments. In principle, they offer an opportunity for direct estimates of effects (akin to RCTs) where control for confounding factors would not necessarily be required [ 4 ], rather than relying on adjustment to derive conditional effect estimates [ 10 ]. Of course, the latter may well reach valid and compelling conclusions as well, but causal claims suffer to a higher degree from the familiar threats of bias and unmeasured confounding.

Part 2. A target trial framework for natural experiment studies

In this section, we provide recommendations for evaluation of the ‘as if random’ assumption and provide a unifying Target Trial Framework for NES, which brings together key sets of criteria that can be used to appraise the strength of causal claims from NES and assist with study design and reporting.

In public health, there is considerable overlap between analytic and design-based uses of the term NES. Nevertheless, we argue that if we consider NES a type of study design, causal inference can be strengthened by clear appraisal of the likelihood of ‘as-if’ random allocation of exposure. This should be demonstrated by both empirical evidence and by knowledge and reasoning about the causal question and substantive domain under question [ 8 , 10 ]. Because the concept of ‘as-if’ randomization is difficult, if not impossible to prove, it should be thought of along a ‘continuum of plausibility’ [ 10 ]. Specifically, for claims of ‘as-if’ randomization to be plausible, it must be demonstrated that the variables that determine treatment assignment are exogenous. This means that they are: i) strongly correlated with treatment status but are not caused by the outcome of interest (i.e. no reverse causality) and ii) independent of any other (measured or unmeasured) causes of the outcome of interest [ 8 ].

Given this additional layer of justification, especially with respect to the qualitative knowledge of the assignment process and domain knowledge from practitioners more broadly, we argue where feasible for the involvement of practitioners. This could, for example, be formalized through co-production in which members of the public and policy makers are involved in the development of the evaluation. If we appraise NES as a type of study design, which distinguish themselves from other designs because i) there is a particular change in exposure that is evaluated and ii) causal claims are supported by an argument of the plausibility of as-if randomization, then we guard against conflating NES with other observational designs [ 10 , 28 ].

There is a range of ways of dealing with the problems of selection on measured and unmeasured confounders in NES [ 8 , 10 ] which can be understood in terms of a ‘target trial’ we are trying to emulate, had randomization been possible [ 29 ]. The protocol of a target trial describes seven components common to RCTs (‘eligibility criteria’, ‘treatment strategies’, ‘assignment procedures’, ‘follow-up period’, ‘outcome’, ‘causal contrasts of interest’, and the ‘analysis plan’), and provides a systematic way of improving, reporting and appraising NES relative to a ‘gold standard’ (but often not feasible in practice) trial. In the design phase of a NES deviations from the target trial in each domain can be used to evaluate where improvements and where concessions will have to be made. This same approach can be used to appraise existing NES. The target trial framework also provides a structured way for reporting NES, which will facilitate evaluation of the strength of NES, improve consistency and completeness of reporting, and benefit evidence syntheses.

In Table  1 , we bring together elements of the Target Trial framework and conceptualisations of NES to derive a framework to describe the Target Trial for NES [ 12 ]. By encouraging researchers to address the questions in Table 1 , the framework provides a structured approach to the design, reporting and evaluation of NES across the seven target trial domains. Table 1 also provides recommendations to improve the strength of causal claims from NES, focussing primarily on sensitivity analyses to improve internal validity.

An illustrative example of a well-developed NES based on the criteria outlined in Table 1 is by Reeves et al. [ 39 ]. The NES evaluates the impact of the introduction of a National Minimum Wage on mental health. The study compared a clearly defined intervention group of recipients of a wage increase up to 110% of pre-intervention wage with clearly defined control groups of (1) people ineligible to the intervention because their wage at baseline was just above (100–110%) minimum wage and (2) people who were eligible, but whose companies did not comply and did not increase minimum wage. This study also included several sensitivity tests to strengthen causal arguments. We have aligned this study to the Target Trial framework in Additional file  1 .

The Target Trial Approach for NES (outlined in Table 1 ) provides a straightforward approach to improve, report, and appraise existing NES and to assist in the design of future studies. It focusses on structural design elements and goes beyond the use of quantitative tools alone to assess internal validity [ 12 ]. This work complements the ROBINS-I tool for assessing risk of bias in non-randomised studies of interventions, which similarly adopted the Target Trial framework [ 40 ]. Our approach focusses on the internal validity of a NES, with issues of construct and external validity being outside of the scope of this work (guidelines for these are provided in for example [ 41 ]). It should be acknowledged that less methodologically robust studies can still reach valid and compelling conclusions, even without resembling the notional target trial. However, we believe that drawing on the target trial framework helps highlight occasions when causal inference can be made more confidently.

And finally, the framework does explicitly exclude observational studies that aim to investigate the effects of changes in behaviour without an externally forced driver to do so. For example, although a cohort study can be the basis for the evaluation of a NES in principle, effects of the change of diet of some participants (compared to those who did not change their diet) is not an external cause (i.e. exogenous) and does not fall within the definition of an experiment [ 11 ]. However, such studies are likely to be more convincing than those which do not study within-person changes and we note that the statistical methods used may be similar to NES.

Despite their advantages, NES remain based on observational data and thus biases in assignment of the intervention can never be completely excluded (although for plausibly ‘as if randomised’ natural experiments these should be minimal). It is therefore important that a robust assessment of different potential sources of bias is reported. It has additionally been argued that sensitivity analyses are required to assess whether a pattern of small biases could explain away any ostensible effect of the intervention, because confidence intervals and statistical tests do not do this [ 14 ]. Recommendations that would improve the confidence with which we can make causal claims from NES, derived from work by Rosenbaum [ 14 ], have been outlined in Table 1 . Although sensitivity analyses can place plausible limits on the size of the effects of hidden biases, because such analyses are susceptible to assumptions about the maximum size of omitted biases, they cannot completely rule out residual bias [ 34 ]. Of importance for the strength of causal claims therefore, is the triangulation of NES with other evaluations using different data or study designs susceptible to different sources of bias [ 5 , 42 ].

None of the recommendations outlined in Table 1 will by themselves eliminate bias in a NES, but neither is it required to implement all of them to be able to make a causal claim with some confidence. Instead, a continuum of confidence in the causal claims based on the study design and the data is a more appropriate and practical approach [ 43 ]. Each sensitivity analysis aims to minimise ambiguity of a particular potential bias or biases, and as such a combination of selected sensitivity analyses can strengthen causal claims [ 14 ]. We would generally, but not strictly, consider a well conducted RCT as the design where we are most confident about such claims, followed by natural experiments, and then other observational studies; this would be an extension of the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) framework [ 44 ]. GRADE provides a system for rating the quality (or certainty) of a body of evidence and grading the strength of recommendations for use in systematic reviews, health technology assessments (HTAs), and clinical practice guidelines. It typically only distinguishes between trials and observational studies when making these judgments (note however, that recent guidance does not make this explicit distinction when using ROBINS-I [ 45 ]). Given the increased contribution of NES in public health, especially those based on routine data [ 37 ], the specific inclusion of NES in this system might improve the rating of the evidence from these study designs.

Our recommendations are of particular importance for ensuring rigour in the context of (public) health research where natural experiments have become increasingly popular for a variety of reasons, including the availability of large routinely collected datasets [ 37 ]. Such datasets invite the discovery of natural experiments, even where the data may not be particularly applicable to this design, but also these enable many of the sensitivity analyses to be conducted from within the same dataset or through linkage to other routine datasets.

Finally, alignment to the Target Trial Framework also links natural experiment studies directly to other measures of trial validity, including pre-registration, reporting checklists, and evaluation through risk-of-bias-tools [ 40 ]. This aligns with previous recommendations to use established reporting guidelines such as STROBE, TREND [ 12 ], and TIDieR-PHP [ 46 ] for the reporting of natural experiment studies. These reporting guidelines could be customized to specific research areas (for example, as developed for a systematic review of quasi-experimental studies of prenatal alcohol use and birthweight and neurodevelopment [ 47 ]).

We provide a conceptualisation of natural experiment studies as they apply to public health. We argue for the appreciation of natural experiments as a type of study design rather than a set of tools for the analyses of non-randomised interventions. Although there will always remain some ambiguity about the strength of causal claims, there are clear benefits to harnessing NES rather than relying purely on observational studies. This includes the fact that NES can be based on routinely available data and that timely evidence of real-world relevance can be generated. The inclusion of a discussion of the plausibility of as-if randomization of exposure allocation will provide further confidence in the strength of causal claims.

Aligning NES to the Target Trial framework will guard against conceptual stretching of these evaluations and ensure that the causal claims about whether public health interventions ‘work’ are based on evidence that is considered ‘good enough’ to inform public health action within a ‘practice-based evidence’ framework. This framework describes how evaluations can help reducing critical uncertainties and adjust the compass bearing of existing policy (in contrast to the ‘evidence-based practice’ framework in which RCTs are used to generate ‘definitive’ evidence for particular interventions) [ 48 ].

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Abbreviations

Randomised Controlled Trial

Natural Experiment

Stable Unit Treatment Value Assumption

Intention-To-Treat

Shadish WR, Cook TD, Campbell DT. Experimental and Quasi-Experimental Designs. 2nd ed. Wadsworth, Cengage Learning: Belmont; 2002.

Google Scholar  

King G, Keohane RO, Verba S. The importance of research Design in Political Science. Am Polit Sci Rev. 1995;89:475–81.

Article   Google Scholar  

Meyer BD. Natural and quasi-experiments in economics. J Bus Econ Stat. 1995;13:151–61.

Dunning T. Natural experiments in the social sciences. A design-based approach. 6th edition. Cambridge: Cambridge University Press; 2012.

Book   Google Scholar  

Craig P, Cooper C, Gunnell D, Haw S, Lawson K, Macintyre S, et al. Using natural experiments to evaluate population health interventions: new medical research council guidance. J Epidemiol Community Health. 2012;66:1182–6.

Cook TD, Shadish WR, Wong VC. Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. J Policy Anal Manag. 2008;27:724–50.

Bärnighausen T, Røttingen JA, Rockers P, Shemilt I, Tugwell P. Quasi-experimental study designs series—paper 1: introduction: two historical lineages. J Clin Epidemiol. 2017;89:4–11.

Waddington H, Aloe AM, Becker BJ, Djimeu EW, Hombrados JG, Tugwell P, et al. Quasi-experimental study designs series—paper 6: risk of bias assessment. J Clin Epidemiol. 2017;89:43–52.

Saeed S, Moodie EEM, Strumpf EC, Klein MB. Evaluating the impact of health policies: using a difference-in-differences approach. Int J Public Health. 2019;64:637–42.

Dunning T. Improving causal inference: strengths and limitations of natural experiments. Polit Res Q. 2008;61:282–93.

Bärnighausen T, Tugwell P, Røttingen JA, Shemilt I, Rockers P, Geldsetzer P, et al. Quasi-experimental study designs series—paper 4: uses and value. J Clin Epidemiol. 2017;89:21–9.

Craig P, Katikireddi SV, Leyland A, Popham F. Natural experiments: an overview of methods, approaches, and contributions to public health intervention research. Annu Rev Public Health. 2017;38:39–56.

Pearl J, Mackenzie D. The book of why: the new science of cause and effect. London: Allen Lane; 2018.

Rosenbaum PR. How to see more in observational studies: some new quasi-experimental devices. Annu Rev Stat Its Appl. 2015;2:21–48.

Petimar J, Ramirez M, Rifas-Shiman SL, Linakis S, Mullen J, Roberto CA, et al. Evaluation of the impact of calorie labeling on McDonald’s restaurant menus: a natural experiment. Int J Behav Nutr Phys Act. 2019;16. Article no: 99.

Fergusson DM, Horwood LJ, Boden JM, Mulder RT. Impact of a major disaster on the mental health of a well-studied cohort. JAMA Psychiatry. 2014;71:1025–31.

Remler DK, Van Ryzin GG. Natural and quasi experiments. In: Research methods in practice: strategies for description and causation. 2nd ed. Thousand Oaks: SAGE Publication Inc.; 2014. p. 467–500.

Cook PA, Hargreaves SC, Burns EJ, De Vocht F, Parrott S, Coffey M, et al. Communities in charge of alcohol (CICA): a protocol for a stepped-wedge randomised control trial of an alcohol health champions programme. BMC Public Health. 2018;18. Article no: 522.

Lumey LH, Stein AD, Kahn HS, Van der Pal-de Bruin KM, Blauw GJ, Zybert PA, et al. Cohort profile: the Dutch hunger winter families study. Int J Epidemiol. 2007;36:1196–204.

Article   CAS   Google Scholar  

Meng X, Qian N. The Long Term Consequences of Famine on Survivors: Evidence from a Unique Natural Experiment using China’s Great Famine. Natl Bur Econ Res Work Pap Ser. 2011;NBER Worki.

Franco M, Bilal U, Orduñez P, Benet M, Morejón A, Caballero B, et al. Population-wide weight loss and regain in relation to diabetes burden and cardiovascular mortality in Cuba 1980-2010: repeated cross sectional surveys and ecological comparison of secular trends. BMJ. 2013;346:f1515.

Angrist J, Bettinger E, Bloom E, King E, Kremer M. Vouchers for private schooling in Colombia: evidence from a randomized natural experiment. Am Econ Rev. 2002;92:1535–58.

Angrist JD. Lifetime earnings and the Vietnam era draft lottery: evidence from social security administrative records. Am Econ Rev. 1990;80:313–36.

Dawson A, Sim J. The nature and ethics of natural experiments. J Med Ethics. 2015;41:848–53.

Bärnighausen T, Oldenburg C, Tugwell P, Bommer C, Ebert C, Barreto M, et al. Quasi-experimental study designs series—paper 7: assessing the assumptions. J Clin Epidemiol. 2017;89:53-66.

Tugwell P, Knottnerus JA, McGowan J, Tricco A. Big-5 Quasi-Experimental designs. J Clin Epidemiol. 2017;89:1–3.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels. J Clin Epidemiol. 2017;89:30–42.

Rubin DB. For objective causal inference, design trumps analysis. Ann Appl Stat. 2008;2:808–40.

Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183:758–64.

Benjamin-Chung J, Arnold BF, Berger D, Luby SP, Miguel E, Colford JM, et al. Spillover effects in epidemiology: parameters, study designs and methodological considerations. Int J Epidemiol. 2018;47:332–47.

Munafò MR, Tilling K, Taylor AE, Evans DM, Smith GD. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol. 2018;47:226–35.

Schwartz S, Gatto NM, Campbell UB. Extending the sufficient component cause model to describe the stable unit treatment value assumption (SUTVA). Epidemiol Perspect Innov. 2012;9:3.

Cawley J, Thow AM, Wen K, Frisvold D. The economics of taxes on sugar-sweetened beverages: a review of the effects on prices, sales, cross-border shopping, and consumption. Annu Rev Nutr. 2019;39:317–38.

Reichardt CS. Nonequivalent Group Designs. In: Quasi-Experimentation. A Guide to Design and Analysis. 1st edition. New York: The Guildford Press; 2019. p. 112–162.

Denzin N. Sociological methods: a sourcebook. 5th ed. New York: Routledges; 2006.

Matthay EC, Hagan E, Gottlieb LM, Tan ML, Vlahov D, Adler NE, et al. Alternative causal inference methods in population health research: evaluating tradeoffs and triangulating evidence. SSM - Popul Heal. 2020;10:10052.

Leatherdale ST. Natural experiment methodology for research: a review of how different methods can support real-world research. Int J Soc Res Methodol. 2019;22:19–35.

Reichardt CS. Quasi-experimentation. A guide to design and analysis. 1st ed. New York: The Guildford Press; 2019.

Reeves A, McKee M, Mackenbach J, Whitehead M, Stuckler D. Introduction of a National Minimum Wage Reduced Depressive Symptoms in Low-Wage Workers: A Quasi-Natural Experiment in the UK. Heal Econ (United Kingdom). 2017;26:639–55.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

Shadish WR, Cook TD, Campbell DT. Generalized Causal Inference: A Grounded Theory. In: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. 2nd ed. Belmont: Wadsworth, Cengage Learning; 2002. p. 341–73.

Lawlor DA, Tilling K, Smith GD. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45:1866–86.

Hernán MA. The C-word: scientific euphemisms do not improve causal inference from observational data. Am J Public Health. 2018;108:616–9.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction - GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64:383–94.

Schünemann HJ, Cuello C, Akl EA, Mustafa RA, Meerpohl JJ, Thayer K, et al. GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. J Clin Epidemiol. 2019;111:105–14.

Campbell M, Katikireddi SV, Hoffmann T, Armstrong R, Waters E, Craig P. TIDieR-PHP: a reporting guideline for population health and policy interventions. BMJ. 2018;361:k1079.

Mamluk L, Jones T, Ijaz S, Edwards HB, Savović J, Leach V, et al. Evidence of detrimental effects of prenatal alcohol exposure on offspring birthweight and neurodevelopment from a systematic review of quasi-experimental studies. Int J Epidemiol. 2021;49(6):1972-95.

Ogilvie D, Adams J, Bauman A, Gregg EW, Panter J, Siegel KR, et al. Using natural experimental studies to guide public health action: turning the evidence-based medicine paradigm on its head. J Epidemiol Community Health. 2019;74:203–8.

Download references

Acknowledgements

This study is funded by the National Institute for Health Research (NIHR) School for Public Health Research (Grant Reference Number PD-SPH-2015). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. The funder had no input in the writing of the manuscript or decision to submit for publication. The NIHR School for Public Health Research is a partnership between the Universities of Sheffield; Bristol; Cambridge; Imperial; and University College London; The London School for Hygiene and Tropical Medicine (LSHTM); LiLaC – a collaboration between the Universities of Liverpool and Lancaster; and Fuse - The Centre for Translational Research in Public Health a collaboration between Newcastle, Durham, Northumbria, Sunderland and Teesside Universities. FdV is partly funded by National Institute for Health Research Applied Research Collaboration West (NIHR ARC West) at University Hospitals Bristol NHS Foundation Trust. SVK and PC acknowledge funding from the Medical Research Council (MC_UU_12017/13) and Scottish Government Chief Scientist Office (SPHSU13). SVK acknowledges funding from a NRS Senior Clinical Fellowship (SCAF/15/02). KT works in the MRC Integrative Epidemiology Unit, which is supported by the Medical Research Council (MRC) and the University of Bristol [MC_UU_00011/3].

Author information

Authors and affiliations.

Population Health Sciences, Bristol Medical School, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK

Frank de Vocht, Cheryl McQuire, Kate Tilling & Matthew Hickman

NIHR School for Public Health Research, Newcastle, UK

Frank de Vocht & Cheryl McQuire

NIHR Applied Research Collaboration West, Bristol, UK

Frank de Vocht

MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Bristol, UK

Srinivasa Vittal Katikireddi & Peter Craig

MRC IEU, University of Bristol, Bristol, UK

Kate Tilling

You can also search for this author in PubMed   Google Scholar

Contributions

FdV conceived of the study. FdV, SVK,CMQ,KT,MH, PC interpretated the evidence and theory. FdV wrote the first version of the manuscript. SVK,CMQ,KT,MH, PC provided substantive revisions to subsequent versions. All authors have read and approved the manuscript. FdV, SVK,CMQ,KT,MH, PC agreed to be personally accountable for their own contributions and will ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.

Corresponding author

Correspondence to Frank de Vocht .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Online Supplementary Material. Table 1 . the Target Trial for Natural Experiments and Reeves et al. [ 28 ]. Alignment of Reeves et al. (Introduction of a National Minimum Wage Reduced Depressive Symptoms in Low-Wage Workers: A Quasi-Natural Experiment in the UK. Heal Econ. 2017;26:639–55) to the Target Trial framework.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

de Vocht, F., Katikireddi, S.V., McQuire, C. et al. Conceptualising natural and quasi experiments in public health. BMC Med Res Methodol 21 , 32 (2021). https://doi.org/10.1186/s12874-021-01224-x

Download citation

Received : 14 July 2020

Accepted : 28 January 2021

Published : 11 February 2021

DOI : https://doi.org/10.1186/s12874-021-01224-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Public health
  • Public health policy
  • Natural experiments
  • Quasi experiments
  • Evaluations

BMC Medical Research Methodology

ISSN: 1471-2288

public experiments

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Social and public experiments and new figurations of science and politics in postcolonial Africa

Profile image of Richard Rottenburg

Related Papers

Vivekanand Jha

public experiments

Ilja Pavone

Roland Clark

Jason Bales

Based on concerns regarding lenient research restrictions, cultural and linguistic misunderstanding, disadvantageous research potentiality, coercion, and exploitation, I argue that the ethicality of clinical trials in developing countries is dependent upon a global organization committed to creating an environment which fosters the protection of vulnerable populations, enforces stringent research regulation and review, and ensures complete diversity among its members. A few predictable objections are then carefully considered.

Health & Place

Edison Bicudo

Paul Wenzel Geissler

Cadernos de Saúde Pública

Douglas Lackey

Allen Herman

Tanya Lyons

Pharmaceutical Colonialism is the term used to describe the activities of some of the big pharmaceutical companies and their contract research organizations (CROs), that involves exploiting the sickness and poverty of citizens of weak and/or developing states. This is enabled because there is a failure or lack of ethical policies and rules within those states that are implemented and designed to protect against unethical clinical drug trials. It is also caused by the CROs that can justify their study designs within various ethical loop holes in current international ethical guidelines. This paper will examine the issues of research with vulnerable populations, and their ability to make informed consent; and the use of placebo controlled clinical trials, where international best practice or standard of care is denied, simply because it would not be available in the local African context, because health care systems are non-existent or not functional.

Bioética Cátedra UNESCO

The academic literature in research ethics has been marked in the past decade by a much broader focus on the need for the protection of developing communities subjected to international clinical trials. Because of the proximity of the revision of the Declaration of Helsinki, completed in October 2008, most papers have addressed the issue of a double standard of care following the use of placebo. However, other no less important issues, such as interactions between the lifestyles structures of low-income communities and the efficiency of risk-minimising procedures also deserve attention. The purpose of this paper is to discuss forms of uncertainty involved in clinical trials in poor and lowincome countries that are not addressed by conventional methods of risk assessment. Furthermore, the increase in size of risks that are identified by conventional assessment methods will be addressed. Besides, the difficulty in properly applying risk-minimising procedures will be discussed. Finally, this paper proposes the involvement of research ethics committees in the risk evaluation process and the establishment of national ethics evaluation systems.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Sylvester Chima

The Journal of Infectious Diseases

David Wendler

Michael Makanga

BMJ Global Health

Joseph Millum

BMJ (Clinical research ed.)

Paul Ndebele

Erik Malmqvist

Liliana Chocarro

BMC Medical Ethics

Phaik Cheah , Deborah Zion , Deborah Zion , K M.Lwin

Developing World Bioethics

Christophe Perrey

Political Studies

Roland Pierik

JCO Global Oncology

Bodour Salhia

Journal of Applied Philosophy

Pakistan Journal of Public Health

Inayat Memon

BMJ: British Medical Journal

Foundations of Global Health & Human Rights (ed. Lawrence O. Gostin and Benjamin Mason Meier), Oxford University Press

Roberto Andorno , Andrés Constantin

Maria Del Pilar Estevez Diz

Sergio Sismondo

The American Journal of Bioethics

aisha Y malik

American Ethnologist 38(3): 589-0. Impact factor: 1.40

Roberto Abadie

udo schuklenk

Contemporary Clinical Trials Communications

Efe Egharevba

African Journal of Reproductive Health

Ogundokun Olusegun , Bridget Haire

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Sustainability
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Large language models don’t behave like people, even though we may expect them to

Press contact :, media download.

A hand touches an array of lines and nodes, and a fizzle appears.

*Terms of Use:

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license . You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

A hand touches an array of lines and nodes, and a fizzle appears.

Previous image Next image

One thing that makes large language models (LLMs) so powerful is the diversity of tasks to which they can be applied. The same machine-learning model that can help a graduate student draft an email could also aid a clinician in diagnosing cancer.

However, the wide applicability of these models also makes them challenging to evaluate in a systematic way. It would be impossible to create a benchmark dataset to test a model on every type of question it can be asked.

In a new paper , MIT researchers took a different approach. They argue that, because humans decide when to deploy large language models, evaluating a model requires an understanding of how people form beliefs about its capabilities.

For example, the graduate student must decide whether the model could be helpful in drafting a particular email, and the clinician must determine which cases would be best to consult the model on.

Building off this idea, the researchers created a framework to evaluate an LLM based on its alignment with a human’s beliefs about how it will perform on a certain task.

They introduce a human generalization function — a model of how people update their beliefs about an LLM’s capabilities after interacting with it. Then, they evaluate how aligned LLMs are with this human generalization function.

Their results indicate that when models are misaligned with the human generalization function, a user could be overconfident or underconfident about where to deploy it, which might cause the model to fail unexpectedly. Furthermore, due to this misalignment, more capable models tend to perform worse than smaller models in high-stakes situations.

“These tools are exciting because they are general-purpose, but because they are general-purpose, they will be collaborating with people, so we have to take the human in the loop into account,” says study co-author Ashesh Rambachan, assistant professor of economics and a principal investigator in the Laboratory for Information and Decision Systems (LIDS).

Rambachan is joined on the paper by lead author Keyon Vafa, a postdoc at Harvard University; and Sendhil Mullainathan, an MIT professor in the departments of Electrical Engineering and Computer Science and of Economics, and a member of LIDS. The research will be presented at the International Conference on Machine Learning.

Human generalization

As we interact with other people, we form beliefs about what we think they do and do not know. For instance, if your friend is finicky about correcting people’s grammar, you might generalize and think they would also excel at sentence construction, even though you’ve never asked them questions about sentence construction.

“Language models often seem so human. We wanted to illustrate that this force of human generalization is also present in how people form beliefs about language models,” Rambachan says.

As a starting point, the researchers formally defined the human generalization function, which involves asking questions, observing how a person or LLM responds, and then making inferences about how that person or model would respond to related questions.

If someone sees that an LLM can correctly answer questions about matrix inversion, they might also assume it can ace questions about simple arithmetic. A model that is misaligned with this function — one that doesn’t perform well on questions a human expects it to answer correctly — could fail when deployed.

With that formal definition in hand, the researchers designed a survey to measure how people generalize when they interact with LLMs and other people.

They showed survey participants questions that a person or LLM got right or wrong and then asked if they thought that person or LLM would answer a related question correctly. Through the survey, they generated a dataset of nearly 19,000 examples of how humans generalize about LLM performance across 79 diverse tasks.

Measuring misalignment

They found that participants did quite well when asked whether a human who got one question right would answer a related question right, but they were much worse at generalizing about the performance of LLMs.

“Human generalization gets applied to language models, but that breaks down because these language models don’t actually show patterns of expertise like people would,” Rambachan says.

People were also more likely to update their beliefs about an LLM when it answered questions incorrectly than when it got questions right. They also tended to believe that LLM performance on simple questions would have little bearing on its performance on more complex questions.

In situations where people put more weight on incorrect responses, simpler models outperformed very large models like GPT-4.

“Language models that get better can almost trick people into thinking they will perform well on related questions when, in actuality, they don’t,” he says.

One possible explanation for why humans are worse at generalizing for LLMs could come from their novelty — people have far less experience interacting with LLMs than with other people.

“Moving forward, it is possible that we may get better just by virtue of interacting with language models more,” he says.

To this end, the researchers want to conduct additional studies of how people’s beliefs about LLMs evolve over time as they interact with a model. They also want to explore how human generalization could be incorporated into the development of LLMs.

“When we are training these algorithms in the first place, or trying to update them with human feedback, we need to account for the human generalization function in how we think about measuring performance,” he says.

In the meanwhile, the researchers hope their dataset could be used a benchmark to compare how LLMs perform related to the human generalization function, which could help improve the performance of models deployed in real-world situations.

“To me, the contribution of the paper is twofold. The first is practical: The paper uncovers a critical issue with deploying LLMs for general consumer use. If people don’t have the right understanding of when LLMs will be accurate and when they will fail, then they will be more likely to see mistakes and perhaps be discouraged from further use. This highlights the issue of aligning the models with people's understanding of generalization,” says Alex Imas, professor of behavioral science and economics at the University of Chicago’s Booth School of Business, who was not involved with this work. “The second contribution is more fundamental: The lack of generalization to expected problems and domains helps in getting a better picture of what the models are doing when they get a problem ‘correct.’ It provides a test of whether LLMs ‘understand’ the problem they are solving.”

This research was funded, in part, by the Harvard Data Science Initiative and the Center for Applied AI at the University of Chicago Booth School of Business.

Share this news article on:

Related links.

  • Ashesh Rambachan
  • Laboratory for Information and Decision Systems
  • Department of Electrical Engineering and Computer Science
  • Department of Economics

Related Topics

  • Computer science and technology
  • Artificial intelligence
  • Human-computer interaction
  • Laboratory for Information and Decision Systems (LIDS)
  • School of Humanities Arts and Social Sciences

Related Articles

A cartoon android recites an answer to a math problem from a textbook in one panel and reasons about that same answer in another

Reasoning skills of large language models are often overestimated

A question mark amidst numbers and acronyms

Technique improves the reasoning capabilities of large language models

Three boxes demonstrate different tasks assisted by natural language. One is a rectangle showing colorful lines of code with a white speech bubble highlighting an abstraction; another is a pale 3D kitchen, and another is a robotic quadruped dropping a can into a trash bin.

Natural language boosts LLM performance in coding, planning, and robotics

Illustration of three human-like individuals in suits, with heads resembling computers and wires, sitting around at a table

Multi-AI collaboration helps reasoning and factual accuracy in large language models

Previous item Next item

More MIT News

Joshua Bennett

The study and practice of being human

Read full story →

The colorful assemblage of cables and circuits that make up a quantum computer are shown in close detail.

Testing spooky action at a distance

On a sunny day in the Arctic, 15 people bundled in cold-weather gear stand in a line with flags for Australia, France, Canada, United Kingdom, and USA waving on poles above them.

Researchers return to Arctic to test integrated sensor nodes

Marcel Torne Villasevil and Pulkit Agrawal stand in front of a robotic arm, which is picking up a cup

Precision home robots learn with real-to-sim-to-real

A schematic of the shoe shows the different parts of it, including the new sole that has sensors.

Helping Olympic athletes optimize their performance, one stride at a time

Thermometers are connected by lines to create a stylized neural-network

Method prevents an AI model from being overconfident about wrong answers

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

This paper is in the following e-collection/theme issue:

Published on 29.7.2024 in Vol 10 (2024)

This is a member publication of University College London (Jisc)

Preferences for COVID-19 Vaccines: Systematic Literature Review of Discrete Choice Experiments

Authors of this article:

Author Orcid Image

  • Yiting Huang 1, 2 * , MPH   ; 
  • Shuaixin Feng 3 * , MPH   ; 
  • Yuyan Zhao 1 * , BMed   ; 
  • Haode Wang 4 , PhD   ; 
  • Hongbo Jiang 1, 5 , PhD  

1 Department of Epidemiology and Biostatistics, School of Public Health, Guangdong Pharmaceutical University, Guangzhou, China

2 Department of Medical Statistics, School of Basic Medicine and Public Health, Jinan University, Guangzhou, China

3 Outpatient department of Baogang, the First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China

4 School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom

5 Institute for Global Health, University College London, London, United Kingdom

*these authors contributed equally

Corresponding Author:

Hongbo Jiang, PhD

Department of Epidemiology and Biostatistics, School of Public Health

Guangdong Pharmaceutical University

Department of Epidemiology and Biostatistics, School of Public Health Guangdong Pharmaceutical University

No. 283 Jianghai Road, Haizhu District

Guangzhou, 510310

Phone: 86 0 203 405 5355

Fax:86 0 203 405 5355

Email: [email protected]

Background: Vaccination can be viewed as comprising the most important defensive barriers to protect susceptible groups from infection. However, vaccine hesitancy for COVID-19 is widespread worldwide.

Objective: We aimed to systematically review studies eliciting the COVID-19 vaccine preference using discrete choice experiments.

Methods: A literature search was conducted in PubMed, Embase, Web of Science, Scopus, and CINAHL Plus platforms in April 2023. Search terms included discrete choice experiments , COVID-19 , and vaccines and related synonyms. Descriptive statistics were used to summarize the study characteristics. Subgroup analyses were performed by factors such as high-income countries and low- and middle-income countries and study period (before, during, and after the pandemic wave). Quality appraisal was performed using the 5-item Purpose, Respondents, Explanation, Findings, and Significance checklist.

Results: The search yield a total of 623 records, and 47 studies with 53 data points were finally included. Attributes were grouped into 4 categories: outcome, process, cost, and others. The vaccine effectiveness (21/53, 40%) and safety (7/53, 13%) were the most frequently reported and important attributes. Subgroup analyses showed that vaccine effectiveness was the most important attribute, although the preference varied by subgroups. Compared to high-income countries (3/29, 10%), a higher proportion of low- and middle-income countries (4/24, 17%) prioritized safety. As the pandemic progressed, the duration of protection (2/24, 8%) during the pandemic wave and COVID-19 mortality risk (5/25, 20%) after the pandemic wave emerged as 2 of the most important attributes.

Conclusions: Our review revealed the critical role of vaccine effectiveness and safety in COVID-19 vaccine preference. However, it should be noticed that preference heterogeneity was observed across subpopulations and may change over time.

Trial Registration: PROSPERO CRD42023422720; https://tinyurl.com/2etf7ny7

Introduction

Although the World Health Organization has declared the end of COVID-19 as a public health emergency [ 1 ], the persistence of this disease as a global threat should not be overlooked or underestimated [ 2 ]. Vaccination has been regarded as one of the most effective strategies against COVID-19 and reduced global COVID-19 mortality, severe disease, symptomatic cases, and COVID-19 infections [ 2 , 3 ]. Furthermore, studies have shown that COVID-19 vaccine also had a preventive effect against post–COVID-19 condition [ 4 - 6 ].

Despite significant progress made with vaccination efforts, achieving high vaccination coverage remains a challenge due to disparities in vaccine distribution and vaccine hesitancy [ 7 - 9 ]. Disparities in vaccine distribution have been observed between different countries, with vaccination rates varying markedly between high- and low-income countries [ 10 ]. In addition, COVID-19 vaccine hesitancy has been reported across countries [ 11 ], and booster hesitancy has also become a growing concern for public health officials [ 12 ]. Vaccine hesitancy can change over time and in response to different circumstances. Notably, vaccine hesitancy tends to increase when population-level side-effect studies are released after emergency approvals [ 13 ]. These challenges underline the need for well-designed vaccination programs to ensure equitable access and high uptake.

Designing a successful vaccination program, including vaccine selection, rollout, and accessibility, is crucial [ 14 , 15 ]. A thorough understanding of individual needs and preferences will allow us to better tailor vaccination programs, which will facilitate the appeal and uptake of COVID-19 vaccines [ 16 , 17 ]. One approach increasingly used to elicit preferences for vaccines and vaccination programs is the discrete choice experiment (DCE) [ 18 , 19 ]. DCEs are scientific research methods that assess preferences by presenting respondents with a series of hypothetical scenarios. In these scenarios, individuals choose among different alternatives which are characterized by specific attributes. By analyzing these choices, researchers can identify the relative importance of each attribute and estimate utility functions [ 20 , 21 ]. DCEs provide valuable insights into decision-making processes and allow for objective evaluation of attribute-based benefits [ 22 - 24 ]. Published studies have been conducted to identify and review choice-based experiments that assess vaccine preferences [ 18 , 19 ]. However, it is important to note that the nature of various vaccines is different, and the preference for vaccines of COVID-19 was not specifically included in these studies.

The COVID-19 vaccines were developed under emergency conditions where there were no peer-reviewed systematic reviews of DCEs on COVID-19 vaccine preference data to inform global decision-making. The diversity in COVID-19 vaccine preferences may be attributed to disparities in vaccine development and production, vaccination scheduling and management, public trust and uptake, as well as vaccine prioritization strategies across various countries and regions [ 25 ]. Moreover, new mutant variants are more likely to infect new individuals, highlighting the need for more effective booster vaccines [ 26 , 27 ]. This study provides empirical evidence on the development, implementation, and follow-up of the COVID-19 vaccine and provides references for vaccine decision-making of other infectious diseases.

We conducted our review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines ( Multimedia Appendix 1 ) [ 28 ]. This study was registered in the international prospective register of systematic reviews (PROSPERO CRD42023422720).

Search Strategy

A literature search was conducted in PubMed, Embase, Web of Science, Scopus, and CINAHL Plus platforms in April 2023. Search terms included discrete choice experiments , COVID-19 , and vaccines and related synonyms. Further details are provided in Multimedia Appendix 2 .

Eligibility Criteria

The inclusion and exclusion criteria are detailed in Textbox 1 .

Inclusion criteria

  • Study focus: Focused on preferences for COVID-19 vaccine (product, service and distribution, policy intervention, etc)
  • Article or study type: First-hand discrete choice experiment (DCE) data analysis research

Exclusion criteria

  • Study focus: No preferences for COVID-19 vaccine reported
  • Article or study type: Not DCE research; nonoriginal research (including secondary reports, systematic reviews, conference abstracts and presentations, correspondence, editorials, and commentaries); theoretical articles; protocols; book chapters; and duplicates

Data Screening and Extraction

Two reviewers (YH and SF) independently performed a 2-stage screening process to identify eligible studies. In the first stage, titles and abstracts were screened to exclude irrelevant studies using the web-based tool Rayyan (Rayyan Systems, Inc [ 29 ]). In the second stage, full-text versions of selected papers were assessed to ensure that the inclusion criteria were met. Both reviewers compared the selected papers at each stage to ensure agreement. Any discrepancy or uncertainty between the reviewers was addressed through discussion until a consensus was reached. If not, a third (senior) reviewer (HJ) was consulted to resolve the disagreement.

The extracted data were recorded and managed in Microsoft Excel (Microsoft Corp) software. Full texts were extracted and reviewed independently by 2 authors (YH and YZ), and any disagreements were resolved by a third reviewer (HJ). Data extraction was performed for 3 specific aspects, focusing on their relevance and importance for the analysis of the DCE: (1) study information (author, publication year, study period, country, population, and sample size); (2) information on the DCE methodology (survey administration, attribute and level selection, pilot-tested, experimental study design, choice sets per respondent, options per choice set, inclusion of an opt-out option, and statistical models); and (3) information on the DCE results (number of attributes, included attributes classified into 4 categories [outcome, process, cost, and other], and the most important attribute).

Choice-based experiments use different definitions for similar attributes [ 19 ]. To address this issue, the attributes were initially grouped into 4 main categories: outcomes, process, cost, and other. The outcomes category encompassed the outcomes or consequences of vaccine administration, such as safety and effectiveness. The process category included activities related to the delivery and administration of vaccines, such as service delivery, dosing, and visits. The cost category focused on the financial aspects of vaccines. Any attributes that did not fit into these 3 categories were classified as other , such as disease risk, incentives or penalties for vaccination, vaccine advice or support, and so on. The classification of outcome, process, cost, and other attributes depended on the aim and design of the studies. It should be noted that vaccine effectiveness and safety were phrased differently in different studies. To facilitate a comparison between studies, efficacy [ 11 , 30 - 41 ], protection rate [ 42 , 43 ], and decreased deaths [ 44 ] were summarized as vaccine effectiveness, whereas side effects [ 11 , 26 , 31 , 35 , 37 , 40 , 41 , 43 , 45 - 61 ], rare but serious risks [ 62 ], and the likelihood of having a flare [ 62 ] were summarized as vaccine safety ( Multimedia Appendix 3 [ 11 , 26 , 30 - 74 ]).

High-income countries (HICs) and low- and middle-income countries (LMICs) were classified according to the World Bank [ 75 ]. LMICs encompass low-income, lower-middle–income, and upper-middle–income countries. On the basis of previous literatures [ 63 , 76 , 77 ], we hypothesized that individuals’ preferences for vaccines may vary depending on the status of the pandemic. Therefore, we sought to explore how COVID-19 vaccine preferences differed during different study periods. To do this, we used data from the surveillance website [ 78 ] to define the pandemic periods based on daily COVID-19 cases. The first group, before the pandemic wave , referred to the period before the outbreak of the pandemic, when the number of incident cases was low. The second group, during the pandemic wave , represented the peak of the pandemic or was characterized by a rapid increase in the number of incident cases. The third group, after the pandemic wave , was when the number of incident cases decreased and remained low ( Multimedia Appendix 4 [ 11 , 26 , 30 - 74 ]).

Quality Appraisal

The 5-item Purpose, Respondents, Explanation, Findings, and Significance (PREFS) checklist, developed by Joy et al [ 79 ], is widely accepted and used to assess the reporting quality of preference studies [ 18 , 80 - 84 ]. It evaluates studies based on criteria such as the study’s purpose, respondent sampling, explanation of assessment methods, inclusion of complete response sets in the findings, and use of significance testing.

Data Synthesis and Analysis

This review used a combination of text and summary tables to effectively convey information about the characteristics and results of the included studies. Descriptive statistics were used to summarize the study characteristics. The findings were synthesized in a narrative format, providing an overview of the included studies, highlighting the key features of the study designs, and presenting the main findings of the COVID-19 vaccine preference studies. Subgroup analyses were performed by independent factors such as HICs or LMICs and study period (before, during, and after the pandemic wave).

Study Selection

The search yielded a total of 623 records. After title and abstract screening, 513 (82.3%) records were excluded. An additional 63 (10.1%) studies were excluded after full-text assessment. Finally, 47 (7.5%) studies met the eligibility criteria and were included in the review ( Figure 1 ).

public experiments

Study and Sample Characteristics

We included 47 studies from 29 countries. Among them, 5 (11%) studies were conducted in multiple countries, with 4 studies conducted in both HICs and LMICs and 1 study conducted in >1 HICs. In addition, 22 (47%) studies were conducted in HICs, while 21 (45%) studies were conducted in LMICs. China stood out with the highest number of preference-based DCEs for COVID-19 vaccines, with 19 (40%) studies. The United States followed closely with 9 (19%) studies, followed by France (n=5, 11%), the United Kingdom (n=4, 9%), Germany (n=4, 9%), and Spain (n=3, 6%). Australia, Canada, India, Italy, Japan, the Netherlands, and South Africa had 2 (4%) studies each. All other countries had only 1 (2%) study ( Figure 2 ). The studies were published between the years 2020 and 2023, with sample sizes ranging from 194 to 13,128 participants. The median number of participants per study was 1456 (IQR 872-2109).

public experiments

Most participants were adults, although the specific focus varied. Most studies (36/47, 77%) involved general population samples, whereas some studies (11/47, 23%) included specific groups of participants. These included 5 studies conducted in universities using web-based tools, including 3 studies with university students and 2 studies with both students and staff. In addition, 3 studies involved health care workers (Chinese intensive care unit clinicians, health care workers, and health care and welfare workers); 2 studies involved parents with children aged <18 years, and 1 study involved people with chronic immune-mediated inflammatory diseases ( Table 1 ).

Author, yearStudy periodCountryPopulationSample size, n
Asim et al [ ], 2023February 26 to April 26, 2021ChinaAdults208
Bansal et al [ ], 2022May to June, 2021IndiaAdults1371
Blaga et al [ ], 2023March to September, 2021HungaryGeneral population1011
Borriello et al [ ], 2021March 27 to 31, 2020AustraliaGeneral population2136
Bughin et al [ ], 2023January 25 to 28, 2021GermanyGeneral population1556
Chen et al [ ], 2023January 24 to March 10, 2021ChinaMiddle-aged and older adults aged ≥50 years293
Chen et al [ ], 2021January 5 to 12, 2021ChinaAdults1066
Craig [ ], 2021November 9 to 11, 2020The United StatesAdults1153
Darrudi et al [ ], 2022March 21 to July 6, 2021IranAdults685
Daziano [ ], 2022October 22 to November 24, 2020The United StatesAdults2723
Díaz Luévano et al [ ], 2021December 18, 2020, to February 1, 2021FranceHealth care and welfare workers4346
Dong et al [ ], 2020June to July, 2020ChinaAdults1236
Dong et al [ ], 2022January 29 to February 13, 2021India, the United Kingdom, Germany, Italy, and SpainAdults812
Donin et al [ ], 2022March 22 to May 3, 2021Czech RepublicUniversity students445
Eshun-Wilson et al [ ], 2021March 15 to March 22, 2021United StatesGeneral population2985
Fu et al [ ], 2020March 17 to 18, 2020ChinaHealth care workers541
Fung et al [ ], 2022July 20 to September 21, 2021ChinaUniversity students and staff members3423
George et al [ ], 2022November 18 to December 24, 2021South AfricaUniversity students and staff members1836
Hazlewood et al [ ], 2023May to August, 2021CanadaPeople with chronic immune-mediated inflammatory diseases551
Hess et al [ ], 2022Summer 2020 to the start of March 2021Africa: Namibia, South Africa; Asia: China Japan, and South Korea; Europe: Denmark, France, Germany, Spain, and the Kingdom; North America: the United States; Oceania: Australia and New Zealand; and South America: Brazil, Chile, Colombia, and EcuadorGeneral population13,128
Huang et al [ ], 2021March 24 to April 10, 2021ChinaChinese ICU clinicians11,951
Igarashi et al [ ], 2022November 19 to 27, 2020JapanGeneral population2155
Krueger and Daziano [ ], 2022March 4 to 10, 2021The United StatesGeneral population1421
Leng et al [ ], 2021NR ChinaAdults1883
Li et al [ ], 2021January 25 to February 25, 2021ChinaUniversity students194
Li et al [ ], 2023January 28 to February 27, 2021China and the United StatesMiddle-aged and older adult population (aged ≥41 years)3444
Liu et al [ ], 2021January 29 to February 13, 2021China and the United StatesGeneral population2480
Luyten et al [ ], 2022October 6 to 16, 2020BelgiumAdults1944
McPhedran et al [ ], 2022March 25 to April 2, 2021The United KingdomAdults2012
McPhedran et al [ ], 2021August 27 to September 3, 2020The United KingdomGeneral population1501
Morillon and Poder [ ], 2022October 19 to November 17, 2020CanadaAdults1599
Mouter et al [ ], 2022November 4 to 10, 2020The NetherlandsGeneral population895
Mouter et al [ ], 2022December 1 to 4, 2020The NetherlandsAdults747
Panchalingam and Shi [ ], 2022October to November, 2021United StatesParents with children aged <18 years1456
Prosser et al [ ], 2023May 21 to June 9, 2021The United StatesAdults1040
Schwarzinger et al [ ], 2021June 22 to July 3, 2020FranceWorking-age population (aged 18-64 years)1942
Steinert et al [ ], 2022Germany in April 2021; France, Italy, Poland, Spain, and Sweden in June 2021France, Germany, Italy, Poland, Spain, and SwedenAdults6030
Teh et al [ ], 2022March 2021MalaysiaAdults2028
Tran et al [ ], 2023April to August, 2022VietnamAdults871
Velardo et al [ ], 2021November 30 to December 16, 2020FranceWorking-age population (aged 18-64 years)5519
Wang et al [ ], 2022August 2020ChinaAdults873
Wang et al [ ], 2021February 26 to 28, 2021ChinaWorking-age population (aged 18-64 years)1773
Wang et al [ ], 2022Mid-September to the end of October, 2021ChinaParents with children <18 years old298
Wang et al [ ], 2022May 2021ChinaUniversity students1138
Wang et al [ ], 2022May to June, 2021ChinaAdults849
Xiao et al [ ], 2022January 28 to 31, 2021ChinaAdults1576
Zhang et al [ ], 2022July 15 to August 10, 2021ChinaAdults1200

a ICU: intensive care unit.

b NR: not reported.

The Implementation of DCEs

Among these 47 studies, researchers commonly used a multifaceted approach to identify and select attributes and levels. Among the studies reviewed, 23 (49%) studies reported a literature review with qualitative assessments such as expert interviews and public surveys. A total of 25 (53%) studies reported a pilot DCE survey. In terms of survey administration, most studies (40/47, 85%) reported that the DCE was conducted through web-based surveys ( Table 2 ).

Author, yearSurvey administrationAttributes and levels selectionPilot-tested DCEExperimental study designChoice sets per respondentOptions per choice setStatistical models
Asim et al [ ], 2023Web basedFocus groupYesD-optimal algorithm design82+opt outLatent class logit model and nested logistic model
Bansal et al [ ], 2022Web basedLiterature reviewNR D-efficient design62Conditional logit model and nonparametric logit mixed logit model
Blaga et al [ ], 2023NRFocus group and expert interviewsYesD-efficient design83+opt outLatent variable models, random parameters logit model, and hybrid random parameters logit model
Borriello et al [ ], 2021Web basedLiterature review and judgment of respondent understanding and plausibilityNRBayesian d-efficient design83+opt outLatent class model
Bughin et al [ ], 2023Web basedOn the basis of the purpose of the research and necessary calibration of the conjointNRNR103Hierarchical multinomial logit model
Chen et al [ ], 2023NRLiterature review, expert interviews, and current COVID-19 vaccine development progressYesOrthogonal design122Multinomial logistic regression model
Chen et al [ ], 2021Web basedLiterature reviewNRD-efficient design162Conditional logit model and panel mixed logit model
Craig [ ], 2021Web basedLiterature review, expert interviews, and the CDC interim playbook version 2.0YesNR83+opt outConditional logit model, latent class model, and opt-out inflated logit model
Darrudi et al [ ], 2022Web basedLiterature review and expert interviewsYesD-efficient designGroup 1:9 and group 2:10Group 1: 2 and group 2: 2Conditional logit model
Daziano [ ], 2022Web basedLiterature review and focus groupYesBayesian efficient design72+opt outLatent class logit model, conditional logit model, and random effects logit model
Díaz Luévano et al [ ], 2021Web basedLiterature reviewYesEfficient design81+opt outRandom intercept logit models
Dong et al [ ], 2020Web basedLiterature review, expert interviews, and public interviewsYesD-optimal algorithm design10+validity2Mixed logit regression model
Dong et al [ ], 2022Web basedNRYesNRNRNRConditional logit model
Donin et al [ ], 2022Web basedLiterature reviewYesD-efficient designNR2+opt outHierarchical Bayes
Eshun-Wilson et al [ ], 2021Web basedExpert interviews, expert discussion, and literature reviewYesFractional factorial design102+opt outMixed logit model and latent class model
Fu et al [ ], 2020Web basedLiterature review, focus group, and expert interviewsYesFractional factorial design8+ validity2Binary logistic regression model
Fung et al [ ], 2022Web basedLiterature review and expert interviewsNROrthogonal design82+opt outMixed logit model
George et al [ ], 2022Web basedLiterature review and a series of meetings and discussions with the study team and key stakeholders at UKZN NRFractional factorial design82Mixed effects logit model
Hazlewood et al [ ], 2023Web basedGuideline panel discussionYesFractional factorial design102+opt outMain-effects multinomial logit model
Hess et al [ ], 2022Web basedNRNRD-efficient design64+opt outOrdered logit model, latent class model, and nested logit
Huang et al [ ], 2021Web basedExpert interviewsYesFractional factorial design42Multivariable conditional logistic regression model
Igarashi et al [ ], 2022Web basedLiterature reviewNROrthogonal design122+opt outPanel logit model
Krueger and Daziano [ ], 2022NRLiterature review and focus groupNRBayesian efficient design72+opt outNormal error components mixed logit model
Leng et al [ ], 2021Face to faceLiterature reviewYesD-efficient partial profile design82Conditional logit model
Li et al [ ], 2021Web basedNRNROrthogonal design62Conditional logit model
Li et al [ ], 2023Web basedLiterature review and expert interviewsNRFractional factorial design132+opt outConditional logit model
Liu et al [ ], 2021Web basedLiterature review and expert interviewsYesNRNR2Conditional logit model
Luyten et al [ ], 2022Web basedLiterature reviewYesBayesian d-optimal design10+ validity2Panel mixed logit model
McPhedran et al [ ], 2022Web basedLiterature reviewNRD-optimal fractional factorial design62+opt outMixed logit model
McPhedran et al [ ], 2021Web basedLiterature reviewNRRotation design62+opt outClustered conditional logit model and hybrid logit model
Morillon and Poder [ ], 2022Web basedLiterature review, expert interviews, and public interviewsNROrthogonal design11+ validity2+opt outMixed logit model, latent class logit model, and multinomial logistic regression
Mouter et al [ ], 2022Web basedLiterature review, expert consultations, and feedbackYesBayesian d-efficient design82Panel mixed logit model
Mouter et al [ ], 2022Web basedLiterature review, expert discussion, and pretestYesBayesian d-optimal design92Panel mixed logit model
Panchalingam and Shi [ ], 2022Web basedLiterature reviewNRD-efficient design10+ validity2+opt outLogistic regressions model and random parameter logit regressions model
Prosser et al [ ], 2023Web basedLiterature review and public interviewsNRFractional factorial design62+opt outBayesian logit regression and latent class analyses
Schwarzinger et al [ ], 2021Web basedLiterature review and expert interviewsNRD-efficient design82+opt outConditional logit model
Steinert et al [ ], 2022Web basedNRNRD-efficient design82Conditional logit model, and fixed-effects model
Teh et al [ ], 2022Web basedLiterature review, expert interviews, and focus groupYesBayesian d-optimal design10+ validity2+opt outMixed logit model,and nested logit model
Tran et al [ ], 2023Web basedLiterature review and expert interviewsNrNR72Hierarchical Bayes
Velardo et al [ ], 2021Web basedNRNRD-efficient design82+opt outConditional logit model
Wang et al [ ], 2022Web basedExpert interviews and public interviewsYesD-efficient design62+opt outMultinominal mixed effects logit model
Wang et al [ ], 2021Web basedIndividual interviewsYesD-optimal algorithm design82+opt outMultiple logistic regression model, nested logistic model, and separate logistic model
Wang et al [ ], 2022Web basedLiterature review, qualitative interview and background information, and levels of the attributesYesD-efficient design82+opt outMultiple logistic model and mixed logit model
Wang et al [ ], 2022Face to faceLiterature reviewNRD-efficient partial profile design8+ validity2Conditional logit model
Wang et al [ ], 2022Face to faceLiterature review and expert interviewsYesD-efficient partial profile design82Conditional logit model, mixed logit model, and latent class model
Xiao et al [ ], 2022Web basedLiterature review, research team discussions, official report, expert discussion, and pretestYesFull factorial design42+opt outRandom parameter logit model and constrained latent class model
Zhang et al [ ], 2022NRLiterature review, expert interviews, and several vaccines on the marketNRFractional factorial design112+opt outConditional logit model

a NR: not reported.

b CDC: Center for disease control and prevention.

c UKZN: the University of KwaZulu-Natal.

Attributes in DCE Studies

Of the 286 attributes identified in the 47 studies, 126 (44.1%) were categorized as outcome attributes, followed by 82 (28.7%) as process attributes, and 22 (7.7%) as cost attributes. The remaining 55 (19.2%) attributes were categorized as other attributes ( Table 3 and Multimedia Appendix 3 ).

Author, yearAttributes, nOutcomeProcessCostOtherMost important attribute
Asim et al [ ], 20237Efficacy and safety Venue for vaccination and vaccine brand Exemption of quarantine for vaccinated travelers , uptake of recommendations from professionals, and vaccine by people aroundBrand
Bansal et al [ ], 20227Effectiveness of vaccine , side effects , and duration of protection offered by the vaccine Developer , and place where vaccination is administered Out-of-pocket cost The proportion of friends and family members who have taken the vaccine Vaccinated friends or family
Blaga et al [ ], 20234Effectiveness of the vaccine , type of possible side effects , and duration of protection provided by the vaccine Country of origin Duration of protection
Borriello et al [ ], 20217Effectiveness , mild side effects , and major side effects Mode of administration , location , and time period when the vaccine was available Cost Safety
Bughin et al [ ], 20235Effectiveness Time of COVID-19 vaccination
Work site , restriction level , choices to get vaccinated , and advantages or penalties Time of COVID-19 vaccination
Chen et al [ ], 20235Risk of adverse effects , protective duration , and effectiveness Injection doses and injection period Safety
Chen et al [ ], 20215Protection rate , adverse effect , and protection duration Convenience of vaccination Cost of the vaccine Safety
Craig [ ], 20215Duration of immunity , risk of severe side effects , and vaccine effectiveness Vaccination setting Proof of vaccination Effectiveness
Darrudi et al [ ], 20226Group 1: effectiveness , risk of severe complications , and duration of protection Group 1: location of vaccine production ; group 2: ageGroup 1: price ; group 2: cost to the community Group 1: underlying disease , employment in the health sector , potential capacity to spread the virus (virus spread) , and the necessary job for society Group 1: effectiveness; group 2: potential capacity to spread the virus
Daziano [ ], 20229Effectiveness , days for antibodies to develop , duration of protection , number of people out of 10 with mild side effects , and the number of people out of 1,000,000 with severe side effects Country where vaccine was developed and introduced (months) Out-of-pocket cost Who recommends this specific vaccine Recommenders
Díaz Luévano et al [ ], 20215Efficacy , indirect protection , safety , and protection duration Recommendation or incentive source Effectiveness
Dong et al [ ], 20206Effectiveness , duration of protection , and adverse event The total number of injections and origin of the product Price (Chinese Yuan) Effectiveness
Dong et al [ ], 20226Adverse effects , efficacy , duration of the vaccine , and time taken for the vaccine to work Vaccine typesThe cost of vaccination Effectiveness
Donin et al [ ], 20226Protection duration , efficacy , and risk of mild side effects Route of vaccination and travel time to vaccination site Recommender of the vaccine Protection duration
Eshun-Wilson et al [ ], 20217Vaccine frequency, waiting time at vaccination site, vaccination location, number of doses required per vaccination episode, and vaccination appointment schedulingVaccination enforcement and who has already received the vaccine in your community?Vaccine frequency
Fu et al [ ], 20207Vaccine safety and vaccine efficacy Out-of-pocket costs Infection probability , case fatality ratio , possible trends of the epidemic , and acceptance of social contacts Possible trends of the epidemic
Fung et al [ ], 20227Risk of a mild or moderate adverse event after vaccination , risk of a severe adverse event after vaccination , efficacy against COVID-19 infection , efficacy against severe manifestation of COVID-19 infection , and duration of protection after vaccination Out-of-pocket costs Incentives for completing vaccination Quarantine-free travel
George et al [ ], 20227Effectiveness Vaccination location , waiting time at the vaccination site , number of doses , boosters required , and vaccine origin Incentives for vaccination Effectiveness
Hazlewood et al [ ], 20234Effectiveness , rare but serious risks , and likelihood of having a flare Dosing Effectiveness
Hess et al [ ], 20229Estimated protection duration, risk of mild side effects, and risk of severe side effectsFeeExemption from international travel restrictions, risk of infection, and risk of serious illness, and population coverageEffectiveness
Huang et al [ ], 20214Effectiveness , risk of adverse reactions , and duration of immunity Whether coworkers have been vaccinated Effectiveness
Igarashi et al [ ], 20225Safety , efficacy , and immunity duration Price Disease prevalenceEffectiveness
Krueger, and Daziano [ ], 20229Effectiveness , protection period , risk of severe side effects , risk of mild side effects , and incubation period Origin of the vaccine , number of required doses , and whether the vaccine has a booster against variantsOut-of-pocket cost Effectiveness
Leng et al [ ], 20217Vaccine effectiveness , side effects , and duration of vaccine protection Accessibility , number of doses , and vaccination sites Proportion of acquaintances vaccinated Effectiveness
Luyten et al [ ], 20225Age , essential profession , and medical risk group Cost to society Virus spreader Medical risk group
Li et al [ ], 20216Nonsevere adverse reactions , efficacy , and protection durationRequired number of doses , and origin of the vaccine Out-of-pocket price Safety
Li et al [ ], 20236Adverse effect , efficacy , duration of vaccine effect , and time for the vaccine to start working Vaccine varieties Cost of vaccination China: cost; The United States: effectiveness
Liu et al [ ], 20216Adverse effect , efficacy , duration of vaccine effect , and time for the vaccine to start workingVaccine varieties Cost of vaccination China: cost; the United States: effectiveness
McPhedran et al [ ], 20224Delivery mode , appointment timing , and proximity Sender SMS text message invitation sender
McPhedran et al [ ], 20215Level of protection offered Location in which the vaccine is administered and the number of doses needed for full protection Recommender of the vaccine and coverage in the media Effectiveness
Morillon and Poder [ ], 20227Effectiveness , safety , and duration Waiting time , priority population , and origin Recommendation Effectiveness
Mouter et al [ ]4The percentage of vaccinated individuals protected against COVID-19 , the number of cases of mild side effects , and the number of cases of severe side effects The month when the vaccine would become available to the respondent Safety
Mouter et al [ ], 20226Decrease in deaths, decrease in health damage, and decrease in households with income lossVaccination at home and vaccination when and where convenientOne-time tax increaseVaccination ambassadors, pay €250 (US $280.75) if does not get vaccinated , receive €100 (US $113) if gets vaccinated , vaccination passport daily activities during outbreak vaccination passport large events , counseling if does not get vaccinated , and mandatory testing at own cost if does not get vaccinated Mandatory testing at own cost if does not get vaccinated
Panchalingam and Shi [ ], 20225Risk of severe side effects , and effectiveness , and duration of vaccine-induced protection Risk of unvaccinated children requiring hospitalization for COVID-19 and local coverage Safety
Prosser et al [ ], 20236Effectiveness , mild common side effects , and rare adverse events Number of doses , total time required to get vaccinated , and regulatory approval Effectiveness
Schwarzinger et al [ ], 20214Safety and efficacy Place to be vaccinated and country of vaccine manufacturer Region of vaccine manufacturer
Steinert et al [ ], 20224Age Employment status , country of residence and health care system capacity , and mortality risk Mortality risk
Teh et al [ ], 20225Effectiveness and risk of developing severe side effects Vaccination schedule during office hours , distance from home to vaccination center , and halal content Halal content
Tran et al [ ] , 20236Immunity duration, effectiveness, and side effectsCost of the vaccineLimitations if not vaccinated and COVID-19 mortality rateMortality rate
Velardo et al [ ], 20215Efficacy , risk of serious side effects per 100,000 , and duration of vaccine immunity Place of vaccine administration and location of vaccine manufacturer Effectiveness
Wang et al [ ], 20226Probability of fever, side effects and effectiveness Location of vaccination , number of doses , and origin of vaccine Price (CNY) Effectiveness
Wang et al [ ], 20217Probability of COVID-19 infection and probability of serious adverse event Brand and venue for vaccination Recommendations from professionals, quarantine for vaccinated travelers , and vaccine uptake of people around Effectiveness
Wang et al [ ] 20227Efficacy and probability of serious adverse event Venue for vaccination and brand Recommendations from professionals, vaccination coverage among all children aged <18 years , and vaccine uptake among acquaintances’ minor childrenEffectiveness
Wang et al [ ], 20226Self-assessed vaccine-related side effects , duration of vaccine protection , and effectiveness Vaccination sites Risk perception and acquaintances vaccinated Safety
Wang et al [ ], 20226Effectiveness , side effects , and duration of protection Vaccination sites Perceived probability of infection of individuals or acquaintances and percentage of acquaintances vaccinated Effectiveness
Xiao et al [ ], 20224Effectiveness , adverse reactions , and protection period Price Effectiveness
Zhang et al [ ], 20226Efficacy , duration , adverse effect , and time period when the vaccine starts working Varieties Cost Cost

a Attribute is significant ( P <.05).

b Not available.

c The corresponding coefficients and P values are not provided.

The Most Important Attribute Reported in DCE Studies

In total, 2 of the 5 multicountry studies did not report preferences for each country and were therefore excluded from the synthesis of the most important attribute. A total of 53 data points on COVID-19 vaccine preferences were collected from the study population of the corresponding country. In the outcome category, among the 30 attributes examined, effectiveness emerged as the most prominent, accounting for 40% (21/53) of the studies [ 31 , 35 , 36 , 38 - 42 , 48 , 50 - 52 , 57 , 58 , 60 - 62 , 64 - 67 ]. Safety was addressed in 13% (7/53) of the studies [ 33 , 43 , 47 , 56 , 59 , 68 , 69 ], while protection duration was mentioned in 4% (2/53) [ 11 , 50 ]. In the process category, 13 attributes were identified. Brand (1/53, 2%) [ 32 ], region of vaccine manufacturer (1/53, 2%) [ 34 ], and halal content (1/53, 2%) [ 53 ] were associated with vaccine production. In addition, waiting time for COVID-19 vaccination (1/53, 2%) [ 70 ] and vaccine frequency (1/53, 2%) [ 71 ] were considered. Furthermore, 3 (6%) studies on vaccine distribution prioritized vaccination for the medical risk group (1/53, 2%) [ 72 ], those who had a higher COVID-19 mortality risk (6/53, 11%) [ 63 ], and those who had the potential capacity to spread the virus (1/53, 2%) [ 72 ]. In the cost category, personal vaccination cost accounted for 6% (3/53) [ 31 , 37 , 41 ]. Among the other attributes (7/53, 13%), disease risk threat was of particular importance, including possible trends of the epidemic (1/53, 2%) [ 30 ] and COVID-19 mortality rate (1/53, 2%) [ 55 ]. In addition, incentives and penalties for vaccination were identified, including quarantine-free travel (1/53, 2%) [ 33 ] and mandatory testing at own expense if not vaccinated (1/53, 2%) [ 44 ]. Vaccine advice or support included vaccination invitation sender (1/53, 2%) [ 73 ] and recommenders (1/53, 2%) [ 46 ]. The proportion of friends and family members who had received the vaccine (1/53, 2%) [ 26 ] was also among the other attributes influencing decision-making ( Table 2 ).

Although effectiveness remained the most important attribute, it is worth noting that variations in preferences were also observed among different subgroups. A higher proportion of studies conducted in LMICs (4/24, 17%) than in HICs (3/29, 10%) prioritized on safety ( Multimedia Appendix 5 ). In addition, COVID-19 mortality risk was the second most important attribute (6/29, 21%) after effectiveness in HICs. Cost was considered to be another most important attribute (3/24, 13%) in LMICs. Interestingly, many other attributes also became more important as the pandemic progressed. Protection duration (2/24, 8%) emerged as one of the most important attributes during the pandemic wave. COVID-19 mortality risk (5/25, 20%) and cost (3/25, 12%) were considered as the most important attributes after the pandemic wave ( Multimedia Appendix 6 ).

Study Quality

The overall reporting quality was deemed acceptable but there is room for improvement. The PREFS scores of the 47 studies ranged from 2 to 4, with a mean of 3.23 (SD 0.52). No study scored 5. Most studies scored 3 (32/47, 68%) or 4 (13/47, 28%), while 2 studies (2/47, 4%) scored 2 ( Multimedia Appendix 7 [ 11 , 26 , 30 - 74 ]).

Principal Findings

This systematic review synthesizes existing data on preference for COVID-19 vaccine using DCE, with the aim of informing improvements in vaccine coverage and vaccine policy development. We identified 47 studies conducted in 29 countries, including 21 HICs and 8 LMICs. HICs had an adequate supply of vaccine since the early emergency availability of COVID-19 vaccine, and HICs had 1.5 times more doses of COVID-19 vaccinations than LMICs by September 2023 [ 85 ]. In total, 19 (40%) studies were conducted in China and 9 (19%) in the United States, demonstrating their significant contribution to the research and their leadership in vaccine research and development. Vaccine effectiveness and safety were the most important attributes in DCEs, although preferences differed among subgroups.

Recent years have seen new trends in the design, implementation, and validation of the DCE. For example, most studies (40/47, 85%) reported that the DCE was administered through web-based surveys, which have become a quick and cost-effective way to collect DCE data [ 66 ]. Almost half of the studies (25/47, 53%) did not report a pilot test. However, piloting in multiple stages throughout the development of a DCE is conducive to identifying appropriate and understandable attributes, considering whether participants can effectively evaluate the full profiles, and producing an efficient design [ 21 , 86 , 87 ].

Overall, vaccine effectiveness and safety have emerged as the most commonly investigated attributes in the outcome category. Despite heterogeneity in preferences across subpopulations, effectiveness remains the primary driver for COVID-19 vaccination across the studies [ 31 , 35 , 36 , 38 - 42 , 48 , 50 , 51 , 57 , 58 , 60 - 62 , 64 - 67 ], similar to the previous findings [ 18 ]. A study conducted in India and Europe found that respondents’ preference for the COVID-19 vaccine increased with effectiveness and peaked at 95% effectiveness [ 45 ]. Another study conducted among university staff and students in South Africa found that vaccine effectiveness not only was a concern but also significantly influenced vaccine choice behavior [ 64 ]. Interestingly, a nationwide stated choice survey in the United States found a strong interaction between effectiveness and other attributes [ 58 ]. These findings support the ongoing efforts to maximize vaccine effectiveness while emphasizing the importance of communicating information on vaccine effectiveness to the target population for promotion [ 62 ].

Safety has also been identified as a crucial factor influencing the acceptance of COVID-19 vaccine [ 33 , 43 , 47 , 56 , 59 , 68 , 69 ]. One study indicated that the likelihood of the general public choosing vaccines with low or moderate side effects increased by 75% and 63%, respectively, compared with vaccines with high side effects. While the likelihood changed within a 30% range when most attributes other than effectiveness and safety were changed [ 69 ]. In addition, respondents in Australia expressed a willingness to wait an additional 0.04 and 1.2 months to reduce the incidence of mild and severe adverse events by 1/10,000, respectively [ 56 ].

Similar to the results of previous systematic reviews of DCEs for various vaccines [ 18 , 19 ], the most common predictors of COVID-19 vaccine acceptance are effectiveness and safety, particularly during the rapid development and rollout of COVID-19 vaccines, which essentially boils down to trust in the vaccine [ 31 ]. Respondents expressed the importance of having a safe and effective COVID-19 vaccine available as soon as possible, but the majority preferred to wait a few months to observe the experience of others rather than be the first in line [ 43 ]. Therefore, collaborating to enhance vaccine effectiveness while reducing the risk of severe side effects could be a highly effective strategy to address vaccine hesitancy and augment vaccine desirability. Dissemination of this important vaccine-related information by governments and health care institutions, along with effective communication by health care professionals, can help build public trust and ultimately increase vaccination rates [ 69 ]. However, these inherent vaccine attributes are typically beyond the control of a vaccination program, and given the ongoing mutations of SARS-CoV-2, it is challenging to predict the effectiveness of the vaccines currently in development [ 66 ]. Global collaboration between scientists and pharmaceutical companies is therefore essential to improve vaccine effectiveness and minimize side effects [ 41 ].

Vaccine production, including its origin, brand, vaccine frequency, and content, are key considerations in the process category. Vaccine brand also has a significant impact on vaccine choice [ 32 ], independent of effectiveness and safety, due to factors such as reputation, country of origin, technological advances, and reported side effects associated with the brands [ 35 ]. For vaccine origin, some studies found that participants preferred domestic vaccines to imported vaccines, which may depend on the availability or the approval of vaccines in different countries [ 31 , 41 , 50 ] or the incidence of side effects among different types of COVID-19 vaccines [ 37 ]. However, some studies found that imported vaccines were more likely to be accepted than domestically produced vaccines, which may be attributed to less trust in domestically produced vaccines [ 57 , 66 ]. A study on vaccine preferences among the Malaysian population found that the composition and production process of the COVID-19 vaccine, which complied with Islamic dietary requirements (ie, halal content) was an important factor for many Malaysians when deciding whether to be vaccinated. This underscores the substantial influence of religion on vaccine choice [ 53 ].

Vaccine frequency was emphasized to play an important role in the choice of COVID-19 vaccine among the US public, while the 90% efficacy with low side effect rate of the COVID-19 vaccine was set. The prospect of vaccinating once to get lifelong immunity was very attractive, reflecting the fact that people were effort minimizers [ 71 ]. This is similar to the nature of the 2 studies referenced in the outcome attribute, where the protection duration is prioritized. Given the threat of COVID-19, people expect the protection duration to be as long as possible [ 11 , 50 ].

When vaccine supply is limited, people tend to prioritize vaccination for those who are more susceptible to the disease, have higher mortality rates from infectious diseases, or have greater potential to spread the virus. A study in Iran found that individuals tend to prioritize vaccination for those in the community with higher potential for virus transmission [ 57 ]. In addition, results from a study in 6 European countries revealed unanimous agreement among respondents that candidates with higher mortality and infection risks should be prioritized for vaccination [ 63 ]. While another study conducted among Belgians also found that respondents would prioritize populations at higher medical risk [ 72 ].

Cost was another important factor influencing COVID-19 vaccine preferences, mostly related to out-of-pocket costs [ 31 , 37 , 41 ]. In 2 studies comparing public preferences for COVID-19 vaccines in China and the United States, vaccine efficacy emerged as the most important driver for the American public, whereas the cost of vaccination had the greatest impact on the Chinese public. This difference was likely due to the relatively stable pandemic situation in China at the time and the lower perceived risk of COVID-19. As a result, the Chinese population was more price sensitive and reluctant to pay for vaccination [ 31 , 37 , 41 ].

For the other category, several different attributes were highlighted, depending on the specific population or situation. When people perceive the threat of a disease, their desire to be vaccinated becomes more urgent. In a study among health care workers in China, participants’ expectations about the future development of COVID-19 had a greater impact on their decision to be vaccinated than their perceived risk of infection or actual case rates, which may have been influenced by their previous experience with seasonal influenza vaccination [ 30 ]. The mortality rate of COVID-19 was considered the most influential factor in the uptake of COVID-19 booster shots in Vietnam. This study was conducted during a pandemic wave in Vietnam, which may have led to an increased perception of public health risks and a greater inclination toward COVID-19 vaccination [ 55 ]. To achieve herd immunity, government authorities can implement policies of incentives and penalties for vaccination to encourage population-wide uptake. A study conducted in the Netherlands revealed that respondents particularly disliked policies that penalized those who were not vaccinated, such as mandatory testing at their own expense if they were not vaccinated [ 44 ]. Instead, they favored policies that rewarded vaccination, such as giving vaccinated individuals additional privileges through a vaccination passport. This finding is consistent with a study in Hong Kong, which found that quarantine-free travel was considered the most important motivator among university students and staff, given their frequent engagement in international travel [ 33 ].

The source of vaccine information also influences vaccine decision-making [ 30 ]. Variation in the sender of vaccination appointment invitation via SMS text messaging and recommenders may potentially influence the public’s willingness to vaccinate against a disease [ 30 , 46 , 73 ]. Furthermore, the acceptance of vaccines was observed to change as the firsthand information about vaccine side effects and effectiveness was provided by friends and family in India [ 26 ].

In HICs, COVID-19 mortality risk was the second most important attribute after effectiveness, as respondents in all 6 high-income European countries from a study of public preferences for COVID-19 vaccine distribution prioritized candidates with higher mortality risks [ 63 ]. However, individuals from LMICs appeared to be more concerned about vaccine safety than those from HICs. This may be related to greater confidence in vaccine safety in HICs due to the earlier initiation and higher rates of COVID-19 vaccination [ 85 ]. In contrast, in some LMICs, vaccine safety was reported as the main reason influencing the willingness to vaccinate due to the rapid development of the COVID-19 vaccines [ 26 , 43 , 47 , 59 , 68 , 69 , 74 , 88 ].

Interestingly, the preference for COVID-19 vaccines may also have changed as the pandemic progressed [ 63 ]. Similarly, effectiveness remained the most important attribute in all periods, possibly due to the continuing severity of the pandemic and the fear of the possible emergence of new coronavirus strains [ 43 ]. Before the pandemic wave, the information on vaccine effectiveness was limited [ 26 ], but people still considered vaccine effectiveness to be the most important driver of vaccination. However, during the pandemic, the public’s perception of the health risk increased. As vaccines were introduced and used, people seemed to become more concerned about the duration of vaccine protection and preferred a longer vaccine protection [ 11 , 50 ]. After the pandemic wave, as the pandemic situation gradually stabilized, cost, combined with their perception of the risk of susceptibility, became more important in their preferences. However, despite this shift, most of the public still believed that people who are at higher risk of infection or death should be vaccinated first [ 63 ].

Limitations

Our study had several limitations. First, not all studies used the same attributes and levels, which limited our ability to perform a quantitative synthesis and directly compare the estimates of model parameters. Instead, we qualitatively synthesized and summarized the range of attributes that may be useful in the formative stage of attribute selection in future DCE surveys investigating the preference for COVID-19 vaccine. Second, although DCEs have been shown to be a valid method for eliciting preferences, the experiment may not represent real market choices but rather hypothetical scenarios with plausible and realistic attributes. However, it offers opportunities to evaluate vaccines that are not yet available in the market or to specific population [ 68 ]. Third, the commonly used classification of outcome, cost, and process was used in order to better explain the public’s preference for vaccine attributes. However, several attributes could not be properly classified, and a fourth category (ie, other attributes) had to be added [ 19 ]. Meanwhile, the variety of attributes included may make it difficult to appropriately name and interpret this category as a whole. Fifth, the PREFS checklist is limited to 5 questions and fails to elicit several criteria that should be reported in DCE studies. Also, it does not provide sufficient tools to assess the biases in a DCE, such as selection bias and nonresponse bias [ 79 , 89 ]. Finally, although there was no specific theoretical framework to structure our qualitative analysis from the 4 identified categories, our classification was based on previous studies [ 18 , 19 , 82 , 90 , 91 ] and our own findings. This synthesis led us to categorize attributes into 4 main classes, providing a clear structure for analyzing and presenting participants’ vaccine preferences and making it easier to compare their preferences across different studies.

Conclusions

In conclusion, this systematic review synthesized the global evidence on preferences for COVID-19 vaccines using the DCE methodology. Vaccine effectiveness and safety were found to be the main drivers for COVID-19 vaccination, highlighting the importance of global collaboration to improve vaccine effectiveness and minimize side effects, as well as the importance of communicating this vaccine-related information to the public to maximize the uptake of COVID-19 vaccines. The subgroup analyses emphasized the importance of differences in vaccine preference of specific populations and time periods in optimizing the acceptance of COVID-19 vaccines. These findings may serve as valuable insights for government agencies involved in the social mobilization process for COVID-19 vaccination. However, the response to the pandemic is a continuous learning process [ 92 ]. It is crucial for policy makers to consider preference evidence when designing policies to promote vaccination.

Acknowledgments

The authors have not received a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.

Data Availability

All data relevant to the study are included in the article or uploaded as supplemental information. Data sets of this study are available upon reasonable request to the corresponding author.

Authors' Contributions

YH, SF, and YZ are joint first authors. HJ conceived the study and its methodology. YH, SF, and YZ designed, refined, and implemented the search strategy; screened articles for inclusion; and extracted and curated the data. All authors contributed to the interpretation of the results. YH, SF, and YZ wrote the initial draft of the manuscript. HJ and HW critically reviewed the manuscript. HJ supervised the study design and provided overall guidance. All authors approved the final draft of the manuscript. HJ had full access to all the data used in this study, and all authors had final responsibility for the decision to submit for publication.

Conflicts of Interest

None declared.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist.

Search strategies.

Attributes included in each category.

The detailed distribution of the study period across countries.

Preference for COVID-19 vaccines among high-income countries and low- and middle-income countries (n=53).

Preference for COVID-19 vaccines in the different study periods (n=53).

Assessment of 47 included studies quality using the Purpose, Respondents, Explanation, Findings, and Significance checklist.

  • WHO chief declares end to COVID-19 as a global health emergency. United Nations. 2023. URL: https://news.un.org/en/story/2023/05/1136367 [accessed 2023-09-11]
  • Lam IC, Zhang R, Man KK, Wong CK, Chui CS, Lai FT, et al. Persistence in risk and effect of COVID-19 vaccination on long-term health consequences after SARS-CoV-2 infection. Nat Commun. Feb 26, 2024;15(1):1716. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Al Kaabi N, Zhang Y, Xia S, Yang Y, Al Qahtani MM, Abdulrazzaq N, et al. Effect of 2 inactivated SARS-CoV-2 vaccines on symptomatic COVID-19 infection in adults: a randomized clinical trial. JAMA. Jul 06, 2021;326(1):35-45. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Azzolini E, Levi R, Sarti R, Pozzi C, Mollura M, Mantovani A, et al. Association between BNT162b2 vaccination and long COVID after infections not requiring hospitalization in health care workers. JAMA. Aug 16, 2022;328(7):676-678. [ CrossRef ] [ Medline ]
  • Català M, Mercadé-Besora N, Kolde R, Trinh NT, Roel E, Burn E, et al. The effectiveness of COVID-19 vaccines to prevent long COVID symptoms: staggered cohort study of data from the UK, Spain, and Estonia. Lancet Respir Med. Mar 2024;12(3):225-236. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Trinh NT, Jödicke AM, Català M, Mercadé-Besora N, Hayati S, Lupattelli A, et al. Effectiveness of COVID-19 vaccines to prevent long COVID: data from Norway. Lancet Respir Med. May 2024;12(5):e33-e34. [ CrossRef ] [ Medline ]
  • Coronavirus (COVID-19) vaccinations. Our World in Data. URL: https://ourworldindata.org/covid-vaccinations [accessed 2024-04-29]
  • Asundi A, O'Leary C, Bhadelia N. Global COVID-19 vaccine inequity: the scope, the impact, and the challenges. Cell Host Microbe. Jul 14, 2021;29(7):1036-1039. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rydland HT, Friedman J, Stringhini S, Link BG, Eikemo TA. The radically unequal distribution of COVID-19 vaccinations: a predictable yet avoidable symptom of the fundamental causes of inequality. Humanit Soc Sci Commun. Feb 23, 2022;9(1):61. [ CrossRef ]
  • Levin AT, Owusu-Boaitey N, Pugh S, Fosdick BK, Zwi AB, Malani A, et al. Assessing the burden of COVID-19 in developing countries: systematic review, meta-analysis and public policy implications. BMJ Glob Health. May 2022;7(5):e008477. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Donin G, Erfányuková A, Ivlev I. Factors affecting young adults' decision making to undergo COVID-19 vaccination: a patient preference study. Vaccines (Basel). Feb 09, 2022;10(2):265. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Stamm TA, Partheymüller J, Mosor E, Ritschl V, Kritzinger S, Alunno A, et al. Determinants of COVID-19 vaccine fatigue. Nat Med. May 2023;29(5):1164-1171. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Larson HJ, Gakidou E, Murray CJ. The vaccine-hesitant moment. N Engl J Med. Jul 07, 2022;387(1):58-65. [ CrossRef ]
  • Williams V, Edem B, Calnan M, Otwombe K, Okeahalam C. Considerations for establishing successful coronavirus disease vaccination programs in Africa. Emerg Infect Dis. Aug 2021;27(8):2009-2016. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Attwell K, Lake J, Sneddon J, Gerrans P, Blyth C, Lee J. Converting the maybes: crucial for a successful COVID-19 vaccination strategy. PLoS One. Jan 20, 2021;16(1):e0245907. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kreps S, Dasgupta N, Brownstein JS, Hswen Y, Kriner DL. Public attitudes toward COVID-19 vaccination: the role of vaccine attributes, incentives, and misinformation. NPJ Vaccines. May 14, 2021;6(1):73. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kreps S, Prasad S, Brownstein JS, Hswen Y, Garibaldi BT, Zhang B, et al. Factors associated with US adults' likelihood of accepting COVID-19 vaccination. JAMA Netw Open. Oct 01, 2020;3(10):e2025594. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lack A, Hiligsmann M, Bloem P, Tünneßen M, Hutubessy R. Parent, provider and vaccinee preferences for HPV vaccination: a systematic review of discrete choice experiments. Vaccine. Oct 27, 2020;38(46):7226-7238. [ CrossRef ] [ Medline ]
  • Diks ME, Hiligsmann M, van der Putten IM. Vaccine preferences driving vaccine-decision making of different target groups: a systematic review of choice-based experiments. BMC Infect Dis. Aug 28, 2021;21(1):879. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Louviere JJ, Pihlens D, Carson R. Design of discrete choice experiments: a discussion of issues that matter in future applied research. J Choice Model. 2011;4(1):1-8. [ FREE Full text ] [ CrossRef ]
  • Lancsar E, Louviere J. Conducting discrete choice experiments to inform healthcare decision making: a user's guide. Pharmacoeconomics. 2008;26(8):661-677. [ CrossRef ] [ Medline ]
  • Viney R, Lancsar E, Louviere J. Discrete choice experiments to measure consumer preferences for health and healthcare. Expert Rev Pharmacoecon Outcomes Res. Aug 09, 2002;2(4):319-326. [ CrossRef ] [ Medline ]
  • Buckell J, Mitchell CA, Fryer K, Newbert C, Brennan A, Joyce J, et al. Identifying preferred features of weight loss programs for adults with or at risk of type 2 diabetes: a discrete choice experiment with 3,960 adults in the U.K. Diabetes Care. Apr 01, 2024;47(4):739-746. [ CrossRef ] [ Medline ]
  • Reed Johnson F, Lancsar E, Marshall D, Kilambi V, Mühlbacher A, Regier DA, et al. Constructing experimental designs for discrete-choice experiments: report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force. Value Health. 2013;16(1):3-13. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mathieu E, Ritchie H, Ortiz-Ospina E, Roser M, Hasell J, Appel C, et al. A global database of COVID-19 vaccinations. Nat Hum Behav. Jul 2021;5(7):947-953. [ CrossRef ] [ Medline ]
  • Bansal P, Raj A, Mani Shukla D, Sunder N. COVID-19 vaccine preferences in India. Vaccine. Apr 01, 2022;40(15):2242-2246. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Abbasi J. What to know about EG.5, the latest SARS-CoV-2 "variant of interest". JAMA. Sep 12, 2023;330(10):900-901. [ CrossRef ] [ Medline ]
  • Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. Jul 21, 2009;339:b2700. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Collaborate on your reviews with anyone, anywhere, anytime. rayyan. URL: https://www.rayyan.ai/ [accessed 2024-04-29]
  • Fu C, Wei Z, Zhu F, Pei S, Li S, Zhang L, et al. Acceptance of and preference for COVID-19 vaccination in healthcare workers: a comparative analysis and discrete choice experiment. medRxiv. Preprint posted online April 11, 2022. [ FREE Full text ] [ CrossRef ]
  • Liu T, He Z, Huang J, Yan N, Chen Q, Huang F, et al. A comparison of vaccine hesitancy of COVID-19 vaccination in China and the United States. Vaccines (Basel). Jun 14, 2021;9(6):649. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Asim S, Wang K, Nichini E, Yip FF, Zhu L, Fung HC, et al. COVID-19 vaccination preferences among non-Chinese migrants in Hong Kong: discrete choice experiment. JMIR Public Health Surveill. Mar 27, 2023;9:e40587. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Fung LW, Zhao J, Yan VK, Blais JE, Chan JC, Li ST, et al. COVID-19 vaccination preferences of university students and staff in Hong Kong. JAMA Netw Open. May 02, 2022;5(5):e2212681. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Schwarzinger M, Watson V, Arwidson P, Alla F, Luchini S. COVID-19 vaccine hesitancy in a representative working-age population in France: a survey experiment based on vaccine characteristics. Lancet Public Health. Apr 2021;6(4):e210-e221. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang K, Wong EL, Cheung AW, Chung VC, Wong CH, Dong D, et al. Impact of information framing and vaccination characteristics on parental COVID-19 vaccine acceptance for children: a discrete choice experiment. Eur J Pediatr. Nov 2022;181(11):3839-3849. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang K, Wong EL, Cheung AW, Yau PS, Chung VC, Wong CH, et al. Influence of vaccination characteristics on COVID-19 vaccine acceptance among working-age people in Hong Kong, China: a discrete choice experiment. Front Public Health. 2021;9:793533. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang J, Ge P, Li X, Yin M, Wang Y, Ming W, et al. Personality effects on Chinese public preference for the COVID-19 vaccination: discrete choice experiment and latent profile analysis study. Int J Environ Res Public Health. Apr 15, 2022;19(8):4842. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Igarashi A, Nakano Y, Yoneyama-Hirozane M. Public preferences and willingness to accept a hypothetical vaccine to prevent a pandemic in Japan: a conjoint analysis. Expert Rev Vaccines. Feb 2022;21(2):241-248. [ CrossRef ] [ Medline ]
  • Díaz Luévano C, Sicsic J, Pellissier G, Chyderiotis S, Arwidson P, Olivier C, et al. Quantifying healthcare and welfare sector workers' preferences around COVID-19 vaccination: a cross-sectional, single-profile discrete-choice experiment in France. BMJ Open. Oct 04, 2021;11(10):e055148. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Velardo F, Watson V, Arwidson P, Alla F, Luchini S, Schwarzinger M, et al. CoVaMax Study Group. Regional differences in COVID-19 vaccine hesitancy in December 2020: a natural experiment in the French working-age population. Vaccines (Basel). Nov 20, 2021;9(11):1364. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li X, Yang L, Tian G, Feng B, Jia X, He Z, et al. Understanding influencing attributes of COVID-19 vaccine preference and willingness-to-pay among Chinese and American middle-aged and elderly adults: a discrete choice experiment and propensity score matching study. Front Public Health. 2023;11:1067218. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McPhedran R, Toombs B. Efficacy or delivery? an online discrete choice experiment to explore preferences for COVID-19 vaccines in the UK. Econ Lett. Mar 2021;200:109747. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mouter N, de Ruijter A, Ardine de Wit G, Lambooij MS, van Wijhe M, van Exel J, et al. "Please, you go first!" preferences for a COVID-19 vaccine among adults in the Netherlands. Soc Sci Med. Jan 2022;292:114626. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mouter N, Boxebeld S, Kessels R, van Wijhe M, de Wit A, Lambooij M, et al. Public preferences for policies to promote COVID-19 vaccination uptake: a discrete choice experiment in the Netherlands. Value Health. Aug 2022;25(8):1290-1297. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dong Y, He Z, Liu T, Huang J, Zhang CJ, Akinwunmi B, et al. Acceptance of and preference for COVID-19 vaccination in India, the United Kingdom, Germany, Italy, and Spain: an international cross-sectional study. Vaccines (Basel). May 24, 2022;10(6):832. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Daziano RA. A choice experiment assessment of stated early response to COVID-19 vaccines in the USA. Health Econ Rev. Mar 31, 2022;12(1):23. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chen Y, Wang J, Yi M, Xu H, Liang H. The COVID-19 vaccination decision-making preferences of elderly people: a discrete choice experiment. Sci Rep. Mar 31, 2023;13(1):5242. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Huang W, Shao X, Wagner AL, Chen Y, Guan B, Boulton ML, et al. COVID-19 vaccine coverage, concerns, and preferences among Chinese ICU clinicians: a nationwide online survey. Expert Rev Vaccines. Oct 2021;20(10):1361-1367. [ CrossRef ] [ Medline ]
  • Prosser LA, Wagner AL, Wittenberg E, Zikmund-Fisher BJ, Rose AM, Pike J. A discrete choice analysis comparing COVID-19 vaccination decisions for children and adults. JAMA Netw Open. Jan 03, 2023;6(1):e2253582. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Blaga Z, Czine P, Takacs B, Szilagyi A, Szekeres R, Wachal Z, et al. Examination of preferences for COVID-19 vaccines in Hungary based on their properties-examining the impact of pandemic awareness with a hybrid choice approach. Int J Environ Res Public Health. Jan 10, 2023;20(2):1270. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Leng A, Maitland E, Wang S, Nicholas S, Liu R, Wang J. Individual preferences for COVID-19 vaccination in China. Vaccine. Jan 08, 2021;39(2):247-254. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang S, Nicholas S, Maitland E, Leng A. Individual preferences for COVID-19 vaccination under the China's 2021 national vaccination policy: a discrete choice experiment study. Vaccines (Basel). Mar 31, 2022;10(4):543. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Teh HS, Woon YL, Leong CT, Hing NY, Mien TY, Roope LS, et al. Malaysian public preferences and decision making for COVID-19 vaccination: a discrete choice experiment. Lancet Reg Health West Pac. Oct 2022;27:100534. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hess S, Lancsar E, Mariel P, Meyerhoff J, Song F, van den Broek-Altenburg E, et al. The path towards herd immunity: predicting COVID-19 vaccination uptake through results from a stated choice study across six continents. Soc Sci Med. Apr 2022;298:114800. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tran BX, Do AL, Boyer L, Auquier P, Le HT, Le Vu MN, et al. Preference and willingness to pay for the regular COVID-19 booster shot in the Vietnamese population: theory-driven discrete choice experiment. JMIR Public Health Surveill. Jan 31, 2023;9:e43055. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Borriello A, Master D, Pellegrini A, Rose JM. Preferences for a COVID-19 vaccine in Australia. Vaccine. Jan 15, 2021;39(3):473-479. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Darrudi A, Daroudi R, Yunesian M, Akbari Sari A. Public preferences and willingness to pay for a COVID-19 vaccine in Iran: a discrete choice experiment. Pharmacoecon Open. Sep 2022;6(5):669-679. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Krueger R, Daziano RA. Stated choice analysis of preferences for COVID-19 vaccines using the Choquet integral. J Choice Model. Dec 2022;45:100385. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang S, Maitland E, Wang T, Nicholas S, Leng A. Student COVID-19 vaccination preferences in China: a discrete choice experiment. Front Public Health. 2022;10:997900. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Craig BM. United States COVID-19 vaccination preferences (CVP): 2020 hindsight. Patient. May 2021;14(3):309-318. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang J, Wagner AL, Chen Y, Jaime E, Hu X, Wu S, et al. Would COVID-19 vaccination willingness increase if mobile technologies prohibit unvaccinated individuals from public spaces? a nationwide discrete choice experiment from China. Vaccine. Dec 05, 2022;40(51):7466-7475. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hazlewood GS, Colmegna I, Hitchon C, Fortin PR, Bernatsky S, Clarke AE, et al. Preferences for COVID-19 vaccination in people with chronic immune-mediated inflammatory diseases. J Rheumatol. Jul 2023;50(7):949-957. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Steinert JI, Sternberg H, Veltri GA, Büthe T. How should COVID-19 vaccines be distributed between the Global North and South: a discrete choice experiment in six European countries. Elife. Oct 18, 2022;11:e79819. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • George G, Strauss M, Lansdell E, Nadesan-Reddy N, Moroe N, Reddy T, et al. South African university staff and students' perspectives, preferences, and drivers of hesitancy regarding COVID-19 vaccines: a multi-methods study. Vaccines (Basel). Aug 04, 2022;10(8):1250. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Morillon GF, Poder TG. Public preferences for a COVID-19 vaccination program in Quebec: a discrete choice experiment. Pharmacoeconomics. Mar 20, 2022;40(3):341-354. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dong D, Xu RH, Wong EL, Hung CT, Feng D, Feng Z, et al. Public preference for COVID-19 vaccines in China: a discrete choice experiment. Health Expect. Dec 2020;23(6):1543-1578. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Xiao J, Wang F, Wang M, Ma Z. Attribute nonattendance in COVID-19 vaccine choice: a discrete choice experiment based on Chinese public preference. Health Expect. Jun 2022;25(3):959-970. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Panchalingam T, Shi Y. Parental refusal and hesitancy of vaccinating children against COVID-19: findings from a nationally representative sample of parents in the U.S. Prev Med. Nov 2022;164:107288. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chen YW, XU JX, Wang Y, Yan HH, Gao JI. Public preference and vaccination willingness for COVID-19 vaccine in China. Fudan Univ J Med Sci. 2021;48(05):617-685. [ FREE Full text ] [ CrossRef ]
  • Bughin J, Cincera M, Kiepfer E, Reykowska D, Philippi F, Żyszkiewicz M, et al. Vaccination or NPI? a conjoint analysis of German citizens' preferences in the context of the COVID-19 pandemic. Eur J Health Econ. Feb 2023;24(1):39-52. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Eshun-Wilson I, Mody A, Tram KH, Bradley C, Sheve A, Fox B, et al. Preferences for COVID-19 vaccine distribution strategies in the US: a discrete choice survey. PLoS One. 2021;16(8):e0256394. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Luyten J, Tubeuf S, Kessels R. Rationing of a scarce life-saving resource: public preferences for prioritizing COVID-19 vaccination. Health Econ. Feb 2022;31(2):342-362. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McPhedran R, Gold N, Bemand C, Weston D, Rosen R, Scott R, et al. Location, location, location: a discrete choice experiment to inform COVID-19 vaccination programme delivery in the UK. BMC Public Health. Mar 04, 2022;22(1):431. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li X, Chong MY, Chan CY, Chan VW, Tong X. COVID-19 vaccine preferences among university students in Hong Kong: a discrete choice experiment. BMC Res Notes. Nov 22, 2021;14(1):421. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • New World Bank country classifications by income level: 2022-2023. The World Bank. 2023. URL: https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2022-2023 [accessed 2022-07-01]
  • Wang J, Jing R, Lai X, Zhang H, Lyu Y, Knoll MD, et al. Acceptance of COVID-19 vaccination during the COVID-19 pandemic in China. Vaccines (Basel). Aug 27, 2020;8(3):482. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang J, Lu X, Lai X, Lyu Y, Zhang H, Fenghuang Y, et al. The changing acceptance of COVID-19 vaccination in different epidemic phases in China: a longitudinal study. Vaccines (Basel). Feb 25, 2021;9(3):191. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • COVID-19 coronavirus pandemic. worldometer. URL: https://www.worldometers.info/coronavirus/ [accessed 2024-04-29]
  • Joy SM, Little E, Maruthur NM, Purnell TS, Bridges JF. Patient preferences for the treatment of type 2 diabetes: a scoping review. Pharmacoeconomics. Oct 1, 2013;31(10):877-892. [ CrossRef ] [ Medline ]
  • Hollin IL, Paskett J, Schuster AL, Crossnohere NL, Bridges JF. Best-worst scaling and the prioritization of objects in health: a systematic review. Pharmacoeconomics. Sep 2022;40(9):883-899. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Beckham SW, Crossnohere NL, Gross M, Bridges JF. Eliciting preferences for HIV prevention technologies: a systematic review. Patient. Mar 2021;14(2):151-174. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bien DR, Danner M, Vennedey V, Civello D, Evers SM, Hiligsmann M. Patients' preferences for outcome, process and cost attributes in cancer treatment: a systematic review of discrete choice experiments. Patient. Oct 2017;10(5):553-565. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wulandari LP, He SY, Fairley CK, Bavinton BR, Marie-Schmidt H, Wiseman V, et al. Preferences for pre-exposure prophylaxis for HIV: a systematic review of discrete choice experiments. EClinicalMedicine. Sep 2022;51:101507. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Russo S, Jongerius C, Faccio F, Pizzoli SF, Pinto CA, Veldwijk J, et al. Understanding patients' preferences: a systematic review of psychological instruments used in patients' preference and decision studies. Value Health. Apr 2019;22(4):491-501. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • COVID-19 data explorer. Our World in Data. URL: https:/​/ourworldindata.​org/​explorers/​coronavirus-data-explorer? tab=table&zoomToSelection=true&time=2020-03-01.​.latest&facet=none&country=High+income~Lower+middle+income~ Low+income&pickerSort=asc&pickerMetric=location&Metric=Vaccine+doses&Interval=Cumulative&Relative+to+Population=true& Color+by+test+positivity=false [accessed 2023-09-11]
  • Street DJ, Viney R. Design of discrete choice experiments. In: Banerjee A, Dixit A, Edwards S, Judd K, editors. Oxford Research Encyclopedias: Economics and Finance. Oxfordshire, UK. Oxford University Press; 2019.
  • Pérez-Troncoso D. A step-by-step guide to design, implement, and analyze a discrete choice experiment. arXiv. Preprint posted online on September 23, 2020. [ FREE Full text ] [ CrossRef ]
  • Patwary MM, Alam MA, Bardhan M, Disha AS, Haque MZ, Billah SM, et al. COVID-19 vaccine acceptance among low- and lower-middle-income countries: a rapid systematic review and meta-analysis. Vaccines (Basel). Mar 11, 2022;10(3):427. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Al-Aqeel S, Alotaiwi R, Albugami B. Patient preferences for epilepsy treatment: a systematic review of discrete choice experimental studies. Health Econ Rev. Mar 18, 2023;13(1):17. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Schaarschmidt ML, Schmieder A, Umar N, Terris D, Goebeler M, Goerdt S, et al. Patient preferences for psoriasis treatments: process characteristics can outweigh outcome attributes. Arch Dermatol. Nov 01, 2011;147(11):1285-1294. [ CrossRef ] [ Medline ]
  • Jiang S, Ren R, Gu Y, Jeet V, Liu P, Li S. Patient preferences in targeted pharmacotherapy for cancers: a systematic review of discrete choice experiments. Pharmacoeconomics. Jan 2023;41(1):43-57. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nabia S, Wonodi CB, Vilajeliu A, Sussman S, Olson K, Cooke R, et al. Experiences, enablers, and challenges in service delivery and integration of COVID-19 vaccines: a rapid systematic review. Vaccines (Basel). May 11, 2023;11(5):974. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

discrete choice experiment
high-income country
low- and middle-income country
Purpose, Respondents, Explanation, Findings, and Significance
Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Edited by A Mavragani; submitted 19.01.24; peer-reviewed by T Ricks, I Saha; comments to author 11.04.24; revised version received 01.05.24; accepted 26.05.24; published 29.07.24.

©Yiting Huang, Shuaixin Feng, Yuyan Zhao, Haode Wang, Hongbo Jiang. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 29.07.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.

Approved Experiments | Public Experiment List

The following is a list of approved experiments at FRIB. Click on a column header to sort by that column.

Experiment Number Contact Spokesperson Co-Spokespersons Title Beam on Target Hours Approved Beam Tuning Hours Date Completed
23509 Baryshev, Sergey Studying Resilience of Power Load Switch to Heavy Ion Radiation 6.00 THU 22 FEB 2024
23508 Cortesi, Marco Dziubinski, Sean Performance test ELOSS detector 12.00 FRI 19 JUL 2024
23505 Lidia, Steve High-flux, short-pulse single event effects demonstration experiment 39.00 SAT 12 AUG 2023
23503 Grzywacz, Robert Kitamura, Noritaka; Neupane, Shree; Xu, Zhengyu Beta-delayed neutron spectroscopy of 24O 8.00 MON 15 MAY 2023
23502 Wrede, Christopher Twentieth Exotic Beam Summer School 24.00 SAT 15 JUL 2023
23084 Pain, Steven Balakrishnan, Sudarsan; Pain, Steven Informing the i process: constraining the As/Ge abundance ratio in a metal poor star via 75Ga(d,pg)76Ga 96.00 36.00 SUN 05 MAY 2024
23080 Moroch, Scott Garcia Ruiz, Ronald Fernando; Karthein, Jonas; Minamisono, Kei Precision Laser Spectroscopy of Atoms and Molecules Containing 229,232Th Isotopes TUE 02 JUL 2024 (partial)
23079 Pain, Steven Chipps, Kelly; Ong, Wei Jia Simultaneous high-precision spectroscopic measurements of (a,p) reactions on proton-rich mass 26 nuclides for X-Ray-Burst nucleosynthesis 192.00 48.00
23078 Korkulu Stuhl, Zeren Estrade, Alfredo Mass measurements on the r-process path relevant for the first r-process peak 156.00 36.00
23076 Grzywacz, Robert Allmond, Mitch; Fijalkowska, Aleksandra; Rasco, Bertis; Rykaczewski, Krzysztof Intersections of nuclear structure and statistical model in βn-decays of cobalt isotopes and isomers 74.00 0.00
23071 Seweryniak, Dariusz Allmond, Mitch; Clark, Roderick; Grzywacz, Robert; Liddick, Sean; Tarasov, Oleg The Study of Proton-Rich Isotopes Along the Proton Drip-Line above 100Sn 72.00 36.00
23070 Lubna, Rebeka Sultana Lubna, Rebeka Sultana; Tang, Tsz Leung Probing the N=28 shell gap migration via simultaneous measurements of the 40,42S(d, p) and 40,42S(d, t) reactions 48.00 48.00
23068 Monteagudo Godoy, Belen Revel, Aldric Study of possible p-wave halo in 34Na ground state 111.00 36.00
23066 Domnanich, Katharina Gaiser, Alyssa; Scielzo, Nicholas; Severin, Gregory; Shusterman, Jennifer Isotope Harvesting with the First 58Ni Primary Beam at FRIB 20.00 36.00
23065 Mitchell, Alan Kay, Benjamin Single-particle fragmentation near N=28 explored through direct reactions on 46Ca
23064 Allmond, Mitch Carpenter, Michael; Gray, Tim; Grzywacz, Robert; Liddick, Sean; Portillo, Mauricio; Rasco, Bertis; Rykaczewski, Krzysztof; Seweryniak, Dariusz; Tarasov, Oleg Seniority Isomers and Single-Particle Evolution in 218-222Pb Region: New Isotopes, Isomers, and Half Lives
23063 Garcia Ruiz, Ronald Fernando Minamisono, Kei; Wilkins, Shane Laser spectroscopy of neutron-rich silicon isotopes 62.00 84.00
23060 Wilkins, Shane Garcia Ruiz, Ronald Fernando; Minamisono, Kei Towards determining the rotational fingerprints of the radioactive molecules 26AlF and 32SiO for astronomical studies WED 20 SEP 2023 (partial)
23058 Brown, Kyle Chajecki, Zbigniew Measuring the isospin dependence of the nucleon effective mass at supersaturation density 116.00 36.00
23056 Richard, Andrea Zegers, Remco Indirect 99Nb(n,g)100Nb Constraint for the Astrophysical i-Process 96.00 36.00
23055 Crider, Ben Allmond, Mitch; Janssens, Robert; Liddick, Sean; Stuchbery, Andrew Decay Spectroscopy Near N = 40: toward the N = 50 island of inversion near 78Ni 60.00 36.00
23054 Williams, Matt Exploring the p-process with SECAR 92.00 48.00
23047 Estrade, Alfredo Crawford, Heather; Fallon, Paul; Schatz, Hendrik Mass measurement along the neutron dripline 180.00 36.00
23045 Montes, Fernando Garg, Ruchi Machine learning techniques to optimise and automate recoil separators performance and operation
23041 Chaple Gore , Ivis Domnanich, Katharina Production and separation of radioplatinum for a theragnostic approach to cancer 48.00 36.00
23038 Redshaw, Matt Ringle, Ryan; Xavier, Mougeot High-precision Penning trap determination of the 99Tc beta-decay Q value for the evaluation of precise beta-spectrum measurements
23037 Fougeres, Chloe de oliveira santos, Francois; de Séréville, Nicolas; Hammache, Faïrouz Angle-integrated measurement of the d(25Al, n gamma)26Si transfer reaction to probe resonance strengths in 25Al(p,gamma)26Si relevant for the production of 26Al in classical novae 132.00 36.00 FRI 10 NOV 2023
23035 Wrede, Christopher Is there a NiCu Cycle in X-ray Bursts? 120.00 36.00
23033 Revel, Aldric Monteagudo Godoy, Belen Investigating the halo structure of 37Mg 108.00 36.00
23031 Ayyad Limonge, Yassid Lay , José; Zamora, Juan Investigating the electric dipole response of halo nuclei using proton inelastic scattering 84.00 36.00 MON 15 JUL 2024
23030 Porzio, Carlotta Crawford, Heather; Porzio, Carlotta; Rice, Emma Quadrupole Collectivity at the Boundaries of the N=40 Island of Inversion 24.00 24.00 SAT 03 FEB 2024
23025 Brown, Kyle Cook, Kaitlin; McCormick, Caitlin Quasifission dynamics with neutron-rich calcium. 60.00 48.00
23023 Pfützner, Marek Grzywacz, Robert; Mazzocchi, Chiara Proton-proton momentum correlations in two-proton radioactivity of 54Zn 120.00 36.00
23017 Wu, Ching-Yen Gade, Alexandra; Henderson, Jack Shape coexistence in N = Z nucleus, 44Ti
23012 Minamisono, Kei Garcia Ruiz, Ronald Fernando; Nörtershäuser, Wilfried; Rossi, Dominic Symmetry-energy constraints using the charge-radius difference of 52Ni-52Cr mirror nuclei 108.00 54.00
23011 Kobayashi, Nobuyuki Iwasaki, Hironori Halo formation in neutron-rich carbon isotopes 114.00 36.00 SAT 09 DEC 2023
23009 Zegers, Remco Giraud, Simon Search for the Isovector Giant Monopole Resonance via the 90Zr(10Be,10B[0+,T=1]) reaction 108.00 36.00 SAT 17 FEB 2024
23006 Karthein, Jonas Ringle, Ryan Precision Binding Energies for Pioneering Astrophysical Studies 53.00 57.00
23005 deSouza, Romualdo Hudan, Sylvie Fusing light nuclei near the neutron drip-line 60.00 72.00
23004 Ronning, Eleanor Richard, Andrea; Ronning, Eleanor The Last Piece of the Generalized Brink Axel Hypothesis Puzzle 68.00 48.00 THU 08 FEB 2024
23003 Iwasaki, Hironori Zimba, George Collectivity at N=27 studied by heavy-ion inelastic scattering and lifetime measurements 64.00 36.00 SAT 09 MAR 2024
23002 Gade, Alexandra Henderson, Jack; Wu, Ching-Yen Collectivity and Shape North of Sn 72.00 FRI 10 MAY 2024
23001 Gade, Alexandra Tostevin, Jeffrey Single-neutron structure at the heart of the N=28 island of inversion 110.00 36.00
22511 Brown, Kyle Wuosmaa, Alan Transfer reactions with the doubly magic 56Ni 216.00
22510 Grinder, Mara Pain, Steven 80Ge(d,p gamma): Informing weak r-process neutron capture 144.00 TUE 30 APR 2024
22507 Cortesi, Marco Dziubinski, Sean Performance Test of the Energy Loss Optical Scintillation System (ELOSS) 12.00 FRI 14 JUL 2023
22505 Kyle, Alicia Spyrou, Artemis; Tsantiri, Artemis Nucleosynthesis of neutron‐deficient isotopes in the A=70 region 107.00 THU 27 JUL 2023
22503 Montes, Fernando Schatz, Hendrik SECAR Development in Preparation for First FRIB Experiments THU 20 JUN 2024 (partial)
22502 Ayyad Limonge, Yassid Mittig, Wolfgang Investigating new alpha-clustering observables in neutron-rich carbon nuclei 130.00 MON 18 DEC 2023
22501 Tarasov, Oleg Gade, Alexandra Commissioning with a high-Z primary beam 120.00 MON 06 FEB 2023
21080 Wu, Jin Estrade, Alfredo; Tarasov, Oleg Decay spectroscopy in the vicinity of the N=126 shell closure 128.00 24.00
21073 Spyrou, Artemis Constraining neutron capture rates for the r-process 100.00 24.00
21072 Wrede, Christopher Strength of the key 15O(a,g)19Ne resonance in X-ray bursts 54.00 24.00 MON 28 NOV 2022
21070 Marshall, Caleb Determining the Site of Globular Cluster Potassium Enrichment via the 38Ar(p, gamma)39K Reaction in Inverse Kinematics 158.00 24.00
21069 Ong, Wei Jia Allmond, Mitch; Grzywacz, Robert; Rasco, Bertis; Schatz, Hendrik; Sherrill, Bradley; Tarasov, Oleg Decay spectroscopy of the N=35 nuclei 55Ca,54K and 53Ar and the search for dripline nucleus 50S 174.00 24.00 TUE 30 JAN 2024
21067 Pain, Steven Simultaneous constraint of the 34g,mCl(p,g) reactions via a Spectroscopic Mirror Study using ORRUBA and SECAR 168.00 36.00
21066 Baumann, Thomas Frank, Nathan Neutron-Unbound Excited States in 53,55Ca 36.00 24.00
21062 Crawford, Heather Allmond, Mitch; Crider, Ben; Grzywacz, Robert; Tripathi, Vandana Decay Spectroscopy Near N=28: Shell Structure, Shapes and Weak Binding 124.00 24.00 MON 04 MAR 2024
21061 Severin, Gregory Shusterman, Jennifer First Isotope Harvesting at FRIB 0.00 0.00
21056 Randhawa, Jaspreet The Isoscalar Giant Monopole Resonance in 132Sn: Implications on the Nuclear Incompressibility 144.00 24.00
21055 Wuosmaa, Alan Macchiavelli, Augusto Evolution of intruder configurations in neutron-rich Mg isotopes 48.00 36.00
21049 Leistenschneider, Erich Ringle, Ryan Seeking the Holy Grail of Nuclear Structure: Precise Binding Energy Determination of 100Sn 72.00 60.00 FRI 22 MAR 2024 (partial)
21048 Ong, Wei Jia Avila, Melina Constraining Molybdenum and Ruthenium production in neutron-rich neutrino-driven winds. 52.00 36.00
21040 Ayyad Limonge, Yassid Studying np pairing in N=Z nuclei: The 52Fe(3He,p) reaction at ReA with the AT-TPC 48.00 48.00
21039 Redshaw, Matt Ringle, Ryan Exploration of Deformed Shell Closures and Pairing Correlations in N = Z Nuclei Around A = 80 96.00 60.00
21038 Pfützner, Marek Proton-proton momentum correlations in two-proton radioactivity 96.00 24.00
21035 Allmond, Mitch Correlation of Triaxial Deformation with Inertial Dynamics, Masses and r-Process Nucleosynthesis 76.00 24.00
21034 Bentley, Michael Wadsworth, Bob Evolution and isospin-dependence of quadrupole collectivity in the heaviest N=Z systems 84.00 24.00 TUE 18 APR 2023
21027 Rykaczewski, Krzysztof Decoding the doubly magic stronghold - decay spectroscopy of 78Ni 120.00 24.00
21026 Avila, Melina Direct measurement of the 59Cu(p,α)56Ni reaction 72.00 36.00
21024 Crawford, Heather Reaction Cross-Section Measurement in 40Mg: A Halo Candidate? 144.00 24.00
21023 Karthein, Jonas Precision laser spectroscopy of atoms and molecules containing exotic thorium isotopes 112.00 30.00
21021 Grzywacz, Robert Complete decay spectroscopy of 100Sn and its neighbors 124.00 24.00
21018 Giraud, Simon Zegers, Remco Constraining electron-capture rates in and near the N=20 island of inversion 96.00 24.00 MON 24 JUN 2024
21016 Hoffman, Calem First observation of neutron-unbound 30F 94.00 24.00
21015 Garcia Ruiz, Ronald Fernando Minamisono, Kei; Vernon, Adam Proton-halo and proton-skin structures in 22Al and 23Al 60.00 60.00 WED 22 MAY 2024
21014 Lotay, Gavin Randhawa, Jaspreet Constraining the Ni-Cu cycle in X-ray bursts and Core Collapse Supernovae: Spectroscopy of 60Zn 100.00 24.00 THU 15 JUN 2023
21010 Fynbo, Hans Wrede, Christopher Study of the beta-decays of 22Al and 26P 48.00 36.00 FRI 30 JUN 2023
21009 Gade, Alexandra Tostevin, Jeffrey; Wiedenhoever, Ingo Understanding shape and configuration coexistence at N=28 72.00 24.00 THU 20 JUL 2023
21007 Gade, Alexandra Janssens, Robert Shape coexistence at the heart of the N = 40 island of inversion 48.00 24.00 TUE 02 AUG 2022
21006 Charity, Robert Is 22Si a doubly-magic nucleus? 160.00 24.00
21004 Wu, Ching-Yen Shell evolution at the N=28 studied through sub-barrier Coulomb excitation 72.00 48.00
21003 Grzywacz-Jones, Kate Cerizza, Giordano; Gade, Alexandra; Grzywacz, Robert The structure of light tin isotopes 72.00 24.00 THU 15 DEC 2022
21001 Riley, Lew Cottle, Paul Measuring proton and neutron matrix elements for the 0^+_gs→2^+_1 transition in the deformed neutron-rich N=28 nucleus 42Si 56.00 24.00 FRI 29 MAR 2024

Your session has expired!

Your session has timed out due to inactivity. Please select Login to continue.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Using natural experimental studies to guide public health action: turning the evidence-based medicine paradigm on its head

Affiliations.

  • 1 MRC Epidemiology Unit and Centre for Diet and Activity Research (CEDAR), University of Cambridge, Cambridge, UK [email protected].
  • 2 MRC Epidemiology Unit and Centre for Diet and Activity Research (CEDAR), University of Cambridge, Cambridge, UK.
  • 3 Charles Perkins Centre and Prevention Research Collaboration, University of Sydney, Sydney, New South Wales, Australia.
  • 4 School of Public Health, Imperial College, London, UK.
  • 5 National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • PMID: 31744848
  • PMCID: PMC6993029
  • DOI: 10.1136/jech-2019-213085

Despite smaller effect sizes, interventions delivered at population level to prevent non-communicable diseases generally have greater reach, impact and equity than those delivered to high-risk groups. Nevertheless, how to shift population behaviour patterns in this way remains one of the greatest uncertainties for research and policy. Evidence about behaviour change interventions that are easier to evaluate tends to overshadow that for population-wide and system-wide approaches that generate and sustain healthier behaviours. Population health interventions are often implemented as natural experiments, which makes their evaluation more complex and unpredictable than a typical randomised controlled trial (RCT). We discuss the growing importance of evaluating natural experiments and their distinctive contribution to the evidence for public health policy. We contrast the established evidence-based practice pathway, in which RCTs generate 'definitive' evidence for particular interventions, with a practice-based evidence pathway in which evaluation can help adjust the compass bearing of existing policy. We propose that intervention studies should focus on reducing critical uncertainties, that non-randomised study designs should be embraced rather than tolerated and that a more nuanced approach to appraising the utility of diverse types of evidence is required. The complex evidence needed to guide public health action is not necessarily the same as that which is needed to provide an unbiased effect size estimate. The practice-based evidence pathway is neither inferior nor merely the best available when all else fails. It is often the only way to generate meaningful evidence to address critical questions about investing in population health interventions.

Keywords: evaluation; natural experimental studies; non-randomised studies; practice-based evidence; public health policy.

© Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY. Published by BMJ.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Two complementary modes of evidence…

Two complementary modes of evidence generation.

Similar articles

  • The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review. Allen D, Gillen E, Rixson L. Allen D, et al. JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001. JBI Libr Syst Rev. 2009. PMID: 27820426
  • Tuberculosis. Bloom BR, Atun R, Cohen T, Dye C, Fraser H, Gomez GB, Knight G, Murray M, Nardell E, Rubin E, Salomon J, Vassall A, Volchenkov G, White R, Wilson D, Yadav P. Bloom BR, et al. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. PMID: 30212088 Free Books & Documents. Review.
  • Cost-Effectiveness and Affordability of Interventions, Policies, and Platforms for the Prevention and Treatment of Mental, Neurological, and Substance Use Disorders. Levin C, Chisholm D. Levin C, et al. In: Patel V, Chisholm D, Dua T, Laxminarayan R, Medina-Mora ME, editors. Mental, Neurological, and Substance Use Disorders: Disease Control Priorities, Third Edition (Volume 4). Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2016 Mar 14. Chapter 12. In: Patel V, Chisholm D, Dua T, Laxminarayan R, Medina-Mora ME, editors. Mental, Neurological, and Substance Use Disorders: Disease Control Priorities, Third Edition (Volume 4). Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2016 Mar 14. Chapter 12. PMID: 27227237 Free Books & Documents. Review.
  • How Should We Evaluate and Use Evidence to Improve Population Oral Health? Brocklehurst PR, Baker SR, Listl S, Peres MA, Tsakos G, Rycroft-Malone J. Brocklehurst PR, et al. Dent Clin North Am. 2019 Jan;63(1):145-156. doi: 10.1016/j.cden.2018.08.009. Epub 2018 Oct 29. Dent Clin North Am. 2019. PMID: 30447789 Review.
  • A collaborative approach to developing sustainable behaviour change interventions for childhood obesity prevention: Development of the Choosing Healthy Eating for Infant Health (CHErIsH) intervention and implementation strategy. Toomey E, Matvienko-Sikar K, Doherty E, Harrington J, Hayes CB, Heary C, Hennessy M, Kelly C, McHugh S, McSharry J, O'Halloran J, Queally M, Heffernan T, Kearney PM, Byrne M. Toomey E, et al. Br J Health Psychol. 2020 May;25(2):275-304. doi: 10.1111/bjhp.12407. Epub 2020 Jan 30. Br J Health Psychol. 2020. PMID: 31999887
  • Move for Life an intervention for inactive adults aged 50 years and older: a cluster randomised feasibility trial. Woods CB, O'Regan A, Doyle C, Hayes G, Clifford A, Donnelly AE, Gillespie P, Glynn L, Murphy AW, Sheikhi A, Bengoechea EG. Woods CB, et al. Front Public Health. 2024 May 15;12:1348110. doi: 10.3389/fpubh.2024.1348110. eCollection 2024. Front Public Health. 2024. PMID: 38813401 Free PMC article. Clinical Trial.
  • Wellbeing Impact Study of High-Speed 2 (WISH2): Protocol for a mixed-methods examination of the impact of major transport infrastructure development on mental health and wellbeing. Morley KI, Hocking L, Saunders CL, Bousfield JW, Bostock J, Brimicombe J, Burgoine T, Dawney J, Hofman J, Lee D, Mackett R, Phillips W, Sussex J, Morris S. Morley KI, et al. PLoS One. 2024 Feb 29;19(2):e0298701. doi: 10.1371/journal.pone.0298701. eCollection 2024. PLoS One. 2024. PMID: 38422089 Free PMC article.
  • First insights into multidisciplinary and multispecialty long COVID networks-a SWOT analysis from the perspective of ambulatory health care professionals. Stengel S, Gölz L, Kolb J, Tarbet K, Völler S, Koetsenruijter J, Szecsenyi J, Merle U. Stengel S, et al. Front Med (Lausanne). 2023 Nov 9;10:1251915. doi: 10.3389/fmed.2023.1251915. eCollection 2023. Front Med (Lausanne). 2023. PMID: 38020101 Free PMC article.
  • The Contribution of Implementation Evaluation to the Field of Public Health. Calise TV, Gardner AJ. Calise TV, et al. Prev Chronic Dis. 2023 Nov 2;20:E98. doi: 10.5888/pcd20.230323. Prev Chronic Dis. 2023. PMID: 37917612 Free PMC article. No abstract available.
  • A complexity-informed in-depth case study into the sustainability and impact of a culture of health: The TR14ers community youth dance group. Williams AJ, Wyatt K, Stevens K, Price L. Williams AJ, et al. PLoS One. 2023 Oct 25;18(10):e0293274. doi: 10.1371/journal.pone.0293274. eCollection 2023. PLoS One. 2023. PMID: 37878601 Free PMC article.
  • General Assembly of the United Nations Political Declaration of the high-level meeting of the general assembly on the prevention and control of non-communicable diseases. New York: General Assembly of the United Nations, 2011.
  • Marteau TM, White M, Rutter H, et al. . Increasing healthy life expectancy equitably in England by 5 years by 2035: could it be achieved? The Lancet 2019;393:2571–3. 10.1016/S0140-6736(19)31510-7 - DOI - PubMed
  • Lee I-M, Shiroma EJ, Lobelo F, et al. . Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy. The Lancet 2012;380:219–29. 10.1016/S0140-6736(12)61031-9 - DOI - PMC - PubMed
  • Capewell S, McCartney M, Holland W. NHS health checks – a naked emperor? J Public Health 2015;37:187–92. - PubMed
  • Reis RS, Salvo D, Ogilvie D, et al. . Scaling up physical activity interventions worldwide: stepping up to larger and smarter approaches to get people moving. The Lancet 2016;388:1337–48. 10.1016/S0140-6736(16)30728-0 - DOI - PMC - PubMed

Publication types

  • Search in MeSH

Related information

  • Cited in Books

Grants and funding

  • MR/K023187/1/MRC_/Medical Research Council/United Kingdom
  • BHF_/British Heart Foundation/United Kingdom
  • MC_UU_12015/1/MRC_/Medical Research Council/United Kingdom
  • MC_UU_12015/6/MRC_/Medical Research Council/United Kingdom
  • PDF-2012-05-157/DH_/Department of Health/United Kingdom
  • WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • Ovid Technologies, Inc.
  • PubMed Central
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

 alt=

0, text: error()">

0, text: error(), css: errorCssClass">

Reset your password

Enter your email address or username and we’ll send you a link to reset your password

Check your inbox

An email with a link to reset your password was sent to the email address associated with your account

Provide email

Please enter your email to complete registration

Activate to continue

Your account isn't active yet. We've emailed you an activation link. Please check your inbox and click the link to activate your account

0, text: error" style="display: none;">

0, text: success" style="display: none;">

  • Relationships

The Bored Panda iOS app is live! Fight boredom with iPhones and iPads here .

  • Partnership
  • Success stories
  • --> -->