experiments with lead

Golden Rain Experiment

golden rain experiment - lead iodide synthesis

Lead Nitrate + Potassium Iodide

Lead nitrate reacts with potassium iodide to produce a beautiful precipitate, as we will show you. The reaction, known as the “Golden Rain” experiment , produces beautiful hexagonal crystals of lead iodide that resemble plates of gold , and makes a great chemistry demonstration.

The golden rain reaction takes advantage of the increased solubility of lead iodide in hot water. Stoichiometric amounts of lead nitrate and potassium iodide are combined, with enough water to dissolve all of the lead iodide precipitates at 80 degrees Celsius. When the solution cools, beautiful lead iodide crystals will fall out of solution.

Lead iodide golden rain experiment requirements

Lead (II) nitrate 1.65 grams (.005 moles) Potassium iodide 1.66 grams (.01 moles) Erlenmeyer flask 1000ml Hotplate-stirrer

Golden Rain Procedure – Tips & Tricks

Lead nitrate and potassium iodide are both solid, soluble ionic compounds. We will combine them for some amazing results.

  • Dissolve each salt in 400ml of distilled water in separate beakers.
  • Combine the liquids in the Erlenmeyer flask so you have 800ml in total. If you wish to use a 500 ml flask instead, simply cut the amounts of compounds and water in half. You will see a yellow precipitate of lead iodide fall out of solution.

Mastering chemistry challenge: How would you calculate the amounts needed yourself? Leave your answer in the comments.

PbI 2 will immediately precipitate out, as it is insoluble in cold water.

3. Heat the solution until all of the lead iodide dissolves, you may need to heat it above 80 degrees Celsius. Heating the solution causes the solubility to increase just enough to dissolve all of the lead iodide.

Lead iodide precipitate – how to best view it

4. Let it cool. This time, the PbI 2 precipitates out in a much more beautiful fashion . This is best viewed in a dark with bright sunlight shining onto the flask, for example through a garage window in the late afternoon. If the lead iodide settles too quickly, stir it with a long stirring rod or start magnetic stirring to keep the particles suspended – giving the “golden rain” effect.

The Golden Rain reaction

Here is the equation for this double-replacement reaction . Lead ii nitrate reacts with potassium iodide forming lead (II) iodide and potassium nitrate.

Pb(NO 3 ) 2 + 2KI -> PbI 2 + 2KNO 3 Net ionic equation : Pb +2 + 2I – -> PbI 2 (s)

Interesting fact: Lead is in the +2 oxidation state in this reaction. Lead (IV) iodide does not exist, because lead (IV) can oxidize iodide to iodine .

Lead / iodine complexes

Don’t use too much iodide, or this reaction will occur, forming the soluble colorless tetraiodoplumbate(II) complex ion.

PbI 2 + 2I – -> PbI 4 -2

Safety & Disposal

Lead nitrate is toxic, the lethal oral dose is approximately 8 grams for an 80kg human. Do not ingest any and avoid skin contact or breathing the dust.

The lead iodide should be filtered and stored in your compound collection. Lead salts should not be washed down the drain. The remaining lead in the solution can be precipitated out with sodium sulfide, as lead sulfide is extremely insoluble. PbS should be stored in a hazardous waste drawer until it can be disposed of properly. Sodium carbonate can be used if a sulfide compound is not available.

If you are doing a lab report, here is an example .

Lead Iodide / Golden Rain experiment video

Here’s the complete experiment video

We filmed this short clip on the golden rain reaction to show how beautiful the flakes of lead iodide look in the sun, when they precipitate in the cooled down solution. The video was taken in a dark garage, with sunlight coming through a window.

About Lead Iodide

Lead (II) iodide is a bright yellow solid, that is slightly soluble in hot water. It is stable in air. The formula for lead iodide is PbI 2 , and its molar mass is 461.01 grams/mole. The symbol for lead is Pb because its latin name is plumbum.

Lead iodide is quite a heavy molecule from a molar mass perspective, because both lead and iodine are heavier atoms. It has a hexagonal close-packed crystal structure, which is why it crystallizes in thin hexagonal-shaped plates. If you love math and crystals, read this.

Lead iodide is used in the manufacturing of solar cells, and also as a photon-detector for x-rays and gamma rays.

Related Articles

If you enjoyed this article about lead iodide and the golden rain experiment, check out making tin crystals , and read about standard reduction potentials and how to name ionic compounds

Extracting potassium metal from a banana

The element potassium

The Loaded Element Lead

Copper crystals made from zinc metal

Solubility Rules

"Diffusion: mysterious movement” experiment

Amazing experiment with lead nitrate and potassium iodide

Mys­te­ri­ous move­ment and mag­i­cal ma­te­ri­al­iza­tion, all in one ex­per­i­ment!

Safe­ty pre­cau­tions

Wear pro­tec­tive gloves and eye pro­tec­tion.

Warn­ing! Don’t try to re­peat this ex­per­i­ment with­out a pro­fes­sion­al su­per­vi­sion!

Reagents and equip­ment:

  • lead ni­trate (1 g);
  • potas­si­um io­dide (1 g);
  • wa­ter (200 mL);
  • glass dish (1);
  • plas­tic tea­spoons (2).

Step-by-step in­struc­tions

Pour a cup of wa­ter into the glass dish. In­tro­duce the lead ni­trate and potas­si­um io­dide to the wa­ter on op­po­site sides of the dish from one an­oth­er. In a minute, the salts will dis­solve and a yel­low pre­cip­i­tate will form a stripe in the cen­ter of the dish.

Pro­cess­es de­scrip­tion

When lead ni­trate and potas­si­um io­dide dis­solve in wa­ter, they dis­so­ci­ate into ions. These ions don’t stay in one place – they grad­u­al­ly spread out through the whole dish in a process known as dif­fu­sion. When lead ions meet io­dide ions, lead (II) io­dide forms, ap­pear­ing as an in­sol­u­ble yel­low pre­cip­i­tate.

experiments with lead

Dozens of experiments you can do at home

One of the most exciting and ambitious home-chemistry educational projects The Royal Society of Chemistry

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Lemon Battery Experiment

Lemon Battery Experiment

The lemon battery experiment is a classic science project that illustrates an electrical circuit, electrolytes, the electrochemical series of metals, and oxidation-reduction (redox) reactions . The battery produces enough electricity to power an LED or other small device, but not enough to cause harm, even if you touch both electrodes. Here is how to construct a lemon battery, a look at how it works, and ways of turning the project into an experiment.

Lemon Battery Materials

You need a few basic materials for a lemon battery, which are available at a grocery store and hardware store.

  • Galvanized nail
  • Copper penny, strip, or wire
  • Wires or strips of aluminum foil
  • Alligator clips or electrical tape
  • An LED bulb, multimeter, digital clock, or calculator

If you don’t have a lemon, use any citrus fruit. A galvanized nail is a steel nail that is plated with zinc. The classic project uses copper and zinc because these two metals are inexpensive and readily available. However, you can use any two conductive metals, as long as they are different from each other.

Make a Lemon Battery

  • Gently squeeze the lemon or roll it on a table to soften it. This helps the juice flow within the fruit.
  • Insert the copper and zinc into the fruit. You want the maximum surface area in the juicy part of the fruit. The lemon peel helps support the metal, but if it is very thick and the metal does not reach the juice, scrape away part of the peel. Ideally, separate the metal pieces by about 2 inches (5 centimeters). Make sure the metals are not touching each other.
  • Connect a wire to the galvanized nail using an alligator clip or electrical tape. Repeat the process with the copper item.
  • Connect the free ends of the wire to an LED or other small electronic device. When you connect the second wire, the light turns on.

Increase the Power

The voltage of a lemon battery is around 1.3 V to 1.5 V, but it generates very little current. There are two easy ways of increasing the battery’s power.

  • Use two pennies and two copper pieces in the lemon. You don’t want any of the metal pieces within the fruit to touch. As before, connect one zinc and one copper piece to the LED. But, wire the other zinc and copper to each other.
  • Wire more lemons in series with each other. Insert a nail and copper piece into each nail. Connect the copper of one lemon to the zinc of the next lemon. Connect the nail at the end of the series to the LED and the copper at the end of the series to the LED. If you don’t have lots of lemons, you can cut up one lemon into pieces.

How to Connect a Lemon Battery

How a Lemon Battery Works

A lemon battery is similar to Volta’s first battery, except he used salt water instead of lemon juice. The zinc and copper are electrodes. The lemon juice is an electrolyte . Lemon juice contains citric acid. While both salts and acids are examples of electrolytes, acids typically do a better job in batteries.

Connecting the zinc and copper electrodes using a wire (even with an LED or multimeter between them) completes an electrical circuit. The circuit is a loop through the zinc, the wire, the copper, and the electrolyte, back to the zinc.

Zinc dissolves in lemon juice, leaving zinc ions (Zn 2+ ) in the juice, while the two electrons per atom move through the wire toward the copper. The following chemical reaction represents this oxidation reaction :

Zn → Zn 2+  + 2e −

Citric acid is a weak acid, but it partially dissociates and leaves some positively charged hydrogen ions (H + ) in the juice. The copper electrode does not dissolve. The excess electrons at the copper electrode combine with the hydrogen ions and form hydrogen gas at the copper electrode. This is a reduction reaction.

2H + + 2e −  → H 2

If you perform the project using lemon juice instead of a lemon, you may observe tiny hydrogen gas bubbles forming on the copper electrode.

Try Other Fruits and Vegetables

The key for using produce in a battery is choosing a fruit of vegetable high in acid (with a low pH). Citrus fruits (lemon, orange, lime, grapefruit) contain citric acid. You don’t need a whole fruit. Orange juice and lemonade work fine. Potatoes work well because they contain phosphoric acid. Boiling potatoes before using them increases their effectiveness. Sauerkraut contains lactic acid. Vinegar works because it contains acetic acid.

Experiment Ideas

Turn the lemon battery into an experiment by applying the scientific method . Make observations about the battery, ask questions, and design experiments to test predictions or a hypothesis .

  • Experiment with other materials for the electrodes besides a galvanized nail and copper item. Other common metals available in everyday life include iron, steel, aluminum, tin, and silver. Try using a nickel and a penny. What do you think will happen if you use two galvanized nails and no copper, or two pennies and no nails? What happens if you try to use plastic, wood, or glass as an electrode? Can you explain your results?
  • If you have a multimeter, explore whether the distance between the electrodes affects the voltage and current of your circuit.
  • How big is the effect of adding a second lemon to the circuit? Does it change the voltage? Does it change the current?
  • Try making batteries using other foods from the kitchen. Predict which ones you think will work and test them. Of course, try fruits and vegetables. Also consider liquids like water, salt water, milk and juice, and condiments, like ketchup, mustard, and salsa.

The lemon battery dates back to at least 2000 years ago. Archaeologists discovered a battery in Iraq using a clay pot, lemon juice, copper, iron, and tar. Of course, people using this battery did not know about electrochemistry or even what electricity was. The use of the ancient battery is unknown.

Credit for discovery of the battery goes to Italian scientists Luigi Galvani and Alessandro Volta. In 1780, Luigi Galvani demonstrated copper, zinc, and frog legs (acting as an electrolyte) produced electricity. Galvani published his work in 1790. An electrochemical cell is called a galvanic cell in his honor.

Alessandro Volta proved electricity did not require an animal. He used brine-soaked paper as an electrolyte and invented the voltaic pile in 1799. A voltaic pile is a stack of galvanic cells, with each cell consisting of a metal disk, an electrolyte layer, and a disk of a different metal.

  • Goodisman, Jerry (2001). “Observations on Lemon Cells”. Journal of Chemical Education . 78(4): 516–518. doi: 10.1021/ed078p516
  • Margles, Samantha (2011). “ Does a Lemon Battery Really Work? “. Mythbusters Science Fair Book . Scholastic. ISBN 9780545237451.
  • Naidu, M. S.; Kamakshiaih, S. (1995). Introduction to Electrical Engineering . Tata McGraw-Hill Education. ISBN 9780074622926.
  • Schmidt, Hans-Jürgen; Marohn, Annette; Harrison, Allan G. (2007). “Factors that prevent learning in electrochemistry”. Journal of Research in Science Teaching . 44 (2): 258–283. doi: 10.1002/tea.20118
  • Swartling, Daniel J.; Morgan, Charlotte (1998). “Lemon Cells Revisited—The Lemon-Powered Calculator”. Journal of Chemical Education . 75 (2): 181–182. doi: 10.1021/ed075p181

Related Posts

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How to test for lead and nitrates in water

My son is doing a science fair project and I need to know what chemicals I can use to test water contaminates. What chemicals can test for lead and nitrates/nitrite?

Melanie Shebel's user avatar

  • 1 $\begingroup$ Welcome to SE.Chemistry! Have you done any online searching for your answer? You may find your answer pretty quickly that way. Also, if you can include ideas that you have found yourself, that shows a degree of effort on your part. This will make it more likely that people here will invest there time and efforts to help you further. $\endgroup$ –  airhuff Commented Jan 29, 2017 at 21:15
  • 2 $\begingroup$ Searching a prominent online store for "lead test kits" I found products for a little as $12 US that test drinking water for lead, nitrites/nitrates and more. This would be a semi-quantitative solution. Your question doesn't state what your specific goals and requirements are; is something like this what you are looking for? $\endgroup$ –  airhuff Commented Jan 29, 2017 at 22:44

3 Answers 3

Since this is a science fair experiment, you will need only common chemicals to test the contaminants in water.

You can use hydrogen sulfide, $\ce{H2S}$ to test lead. (Warning: it has rotten egg smell.) It is a common reagent and can be found in laboratory. You can also use sodium sulfide instead but it too has rotten egg smell.

$$\ce{Pb^2+ + H2S -> PbS + 2H+}$$

Due to the insolubility of lead sulfide in water ($\ce{4.9 \times 10^{-11}g l^{-1}}$), hydrogen sulfide test is such a sensitive test for the detection of lead and also can be detected in filtrate from separation of sparingly soluble lead chloride and other salts and hydrochloric acid.

You can perform the brown ring test because this test is very sensitive to nitrates in solution. You need conc. sulfuric acid and ferrous sulfate. A brown ring is formed at the junction of two layers probably due to formation of $\ce{[Fe(NO)^2+]}$.

$$\ce{2NO3- + 4H2SO4 + 6Fe^2+ -> 6Fe^3+ + 2NO^ + 4SO4^2- + 4H2O}$$ $$\ce{Fe^2+ + NO^ -> [Fe(NO)^2+]}$$

Sensitivity: $\ce{2.5 \mu g}$ ; Concentration limit: 1 in 25000

Similar to brown ring test but uses dilute sulfuric acid.

Extra info.

The regulatory standard method for testing for lead in water uses an A tomic Absorption (AA) spectrophotometer or XRF machines (cost $30,000)( source )

For testing nitrate/nitrite in water, see here .

Other very specific and expensive test used to detect contaminants are Gallocyanin test and diphenylthiocarbazone test for lead, diphenylamine test and nitron test for nitrate and sulfanilic test and indole test for nitrite but I won't elaborate them since it is a science project. :)

Community's user avatar

  • $\begingroup$ Thank you so much! How much of each chemical do I need to be able to test? $\endgroup$ –  Diego Zamora Commented Jan 31, 2017 at 5:34
  • $\begingroup$ The complex is $\ce{[Fe(H2O)5(NO)]^2+}$ $\endgroup$ –  Sid Commented Dec 14, 2023 at 9:46

If you're looking for characteristic reations, two come to mind (unfortunately I do not know what chemicals you have access to).

Nitrates can be tested for with the "brown ring test", sulfuric acid and iron(II) sulfate (see here ).

For lead you can test with the "golden rain" reaction using any iodine salt that is soluble in water (see here , they're using lead(II) nitrate in this example).

Depending on what else is in your water solution, you might need to divide it into several solutions of ions from the 3 distinct analytical groups that are generally discussed in literature. You can do this using distinct group reagents for each of the groups. If that's what you're looking for, I suggest reading up on it on the internet, it's a big topic but there's plenty of resources covering it quite well.

SirJ0hnson's user avatar

  • 1 $\begingroup$ Drinking water contaminates are in the parts per million range. I highly doubt that either of these tests has enough sensitivity. $\endgroup$ –  MaxW Commented Jan 29, 2017 at 23:46

If your son has to perform this experiment in the science fair, I'm sorry to disappoint you but it will not be possible (only if his school has adequate instrumentation such as an Atomic Absorption Spectrophotometer or a Mass Chromatograph. Contamination of water by lead is in the order of parts per billion (1 part of lead per 1 billion parts of water), and although it is sometimes enough to cause health problems, it is indeed a very low amount of lead to be detected by simple experiments such as the reaction with potassium iodide. And I wouldn't recommend leaving a teenager with a big amount of lead in hands because of its extreme toxicity.

To test for nitrates, the brown ring experiment cited by Diego Zamora can possibly provide a visual result in water, since the nitrate concentration in it is not that small. However, test it before with your son so he don't get frustrated if the experiment goes wrong during the fair. Concentrated sulphuric acid can be found as car battery liquid in some specialized stores (is not very difficult to find). Ferrous sulphate can be found in pool products stores. Handle it with extreme care since sulphuric acid is corrosive and a powerful dehydrating compound that can cause severe burns if in contact with skin.

Raul Luciano's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Chemistry Stack Exchange. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged water or ask your own question .

  • Featured on Meta
  • Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...
  • User activation: Learnings and opportunities

Probing the Skin of a Lead Nucleus

  • Physics Department, Duke University, Durham, NC, USA

Figure caption

Popular cartoon visualizations depict the protons and neutrons in a nucleus as colored marbles packed randomly into a sphere. In reality, heavy nuclei—in which neutrons tend to outnumber protons—are more differentiated, with the neutrons nudged radially outward. At the outer limits of such nuclei, the neutrons form a thin “skin” enclosing a core of mixed neutrons and protons (Fig. 1 ). Now, the Lead Radius Experiment (PREX) Collaboration at the Thomas Jefferson National Accelerator Facility in Virginia has determined the thickness of this neutron-rich skin in lead-208, a stable isotope with 44 more neutrons than protons [ 1 ]. The measurement, which addresses questions relating to all four fundamental forces of nature, yields insight into the structure of neutron stars, and it will have far-reaching implications for multimessenger astronomy and particle physics.

The spatial distribution of protons within nuclei is well understood as a result of decades of scattering experiments with electromagnetic probes. Obtaining a similar understanding for neutrons is more challenging, as these neutral particles are mostly invisible to electrically charged probes. Given that neutrons do interact via the strong force, their nuclear distributions are, in principle, discoverable using strongly charged hadronic probes. However, strong interactions, described by quantum chromodynamics, have large theoretical uncertainties, and precise measurements using such methods are therefore difficult to achieve in practice.

Another approach is to leverage weak scattering, and techniques based on this interaction are much more effective. The reason for that is twofold: Weak scattering is far better understood than the scattering caused by strong interactions, and the nucleus’s weak charge—in contrast to its electromagnetic one—is dominated by its neutron content. But the weak interaction is much weaker than the electromagnetic interaction, which means that its subtle effects on electron-nucleus scattering must be teased out carefully.

These subtle effects arise from a special feature of the weak interaction that gives it its signature parity violation: the strength of the interaction depends on a spatial direction. When electrons scatter from neutrons in the nucleus, they exchange Z bosons, the weak-force carrier particles. When the electrons are polarized, the scattering process is asymmetric, with left-handed electrons (whose spins are in antialignment with their momentum) scattering off nuclei slightly less often than right-handed ones. The size of the asymmetry is related to the distribution of neutrons in the nucleus. The effect is tiny—only about one part per two million for the PREX experiment—so its measurement requires a heroic control of systematic uncertainties.

It is this minuscule parity-violating asymmetry that the PREX Collaboration has succeeded in measuring for lead-208, using the Jefferson Lab’s high-resolution spectrometers. The experimenters scattered 953-MeV spin-polarized electrons off a lead foil sandwiched between thin layers of diamond. The polarization of the electrons was reversed hundreds of times per second, following a specific sequence whose details were hidden from the experimenters to reduce potential analysis bias. The measured excess in the right-handed electron-scattering cross section was 5 5 0 ± 1 8 parts per billion, with most of the uncertainty contributed by statistical error. From these data, the team inferred a “neutron radius” (the radius of the neutrons’ distribution within the nucleus; not that of individual neutrons) of 5 . 8 0 0 ± 0 . 0 7 5 fm. Given the proton radius value established by previous experiments, this measurement showed the neutron skin width to be 0 . 2 8 3 ± 0 . 0 7 1 fm—a twofold improvement in precision over the collaboration’s earlier estimate [ 2 ].

As tiny as these dimensions are, their implications are astronomical: measuring the neutron skin in a single nucleus (0.2-fm scale) can inform our knowledge of neutron star structure (km scale). The link between these objects works through a quantity called the symmetry energy—a contribution to the nuclear binding energy that arises because of the Pauli principle in nuclei that have unequal numbers of neutrons and protons—and a related phenomenon called symmetry pressure [ 3 ]. If the symmetry energy increases rapidly as the nuclear density increases, the symmetry pressure will be larger. In a nucleus, a larger symmetry pressure means neutrons are pushed out farther, yielding a thicker neutron skin. Similarly, in a neutron star, a higher symmetry pressure correlates with a larger radius for a given mass.

These neutron star properties hinted at by the PREX measurements will affect how we interpret observations of binary neutron star mergers, which are now routinely detected by gravitational-wave interferometers. Gravitational-wave signals can reveal how matter is deformed during such collisions [ 4 ], but the details depend on how large a neutron star is for a given mass. Insight from the PREX result therefore informs our understanding of these cataclysmic events. A new study already shows that theoretical expectations for the symmetry pressure are systematically a bit low compared with the value inferred from PREX, although the experimental result is still consistent with predictions within uncertainties [ 5 ]. This study also finds that PREX is in mild tension with gravitational-wave determinations of neutron star deformability. Future gravitational-wave and x-ray observations will help clarify the picture [ 3 , 6 ].

Precise measurements of the neutron skin thickness could also lead to new discoveries in particle physics. Neutrinos interact rarely with nuclei, but when they do, they can coherently scatter off an entire nucleus via the exchange of a Z boson, giving the nucleus a gentle kick [ 7 ]. The distribution of nuclear recoil energies depends on the arrangement of the neutrons in the nucleus; any anomalies in the energy distribution can be used to test for new physics. Although the rarity of such neutrino interactions limits the effectiveness of present-generation experiments, for next-generation, high-statistics tests of beyond-the-standard-model neutrino interactions, precise understanding of neutron spatial distributions within the nucleus will reduce ambiguities [ 8 ].

There are several present and future prospects for nuclear-scattering-based neutron skin measurements that will complement the recent result from the PREX Collaboration. A similar polarized-electron-scattering measurement on calcium-48, called the Calcium Radius Experiment (CREX), was recently performed at the Jefferson Lab, and the data are in the process of being analyzed [ 9 ], while an improved measurement on lead-208 is planned at an accelerator facility in Mainz, Germany [ 10 ]. Strong-force-based measurements on rare nuclei are planned for the Facility for Rare Isotope Beams at Michigan State University [ 11 ]. And in a different approach, future observations using gravitational waves, x rays, and neutrinos have exciting potential to shine diverse kinds of “light” on this story of nuclear structure connections over vastly different scales.

  • D. Adhikari et al. (PREX Collaboration), “Accurate determination of the neutron skin thickness of 2 0 8 Pb through parity-violation in electron scattering,” Phys. Rev. Lett. 126 , 172502 (2021) .
  • S. Abrahamyan et al. (PREX Collaboration), “Measurement of the neutron radius of 2 0 8 Pb through parity violation in electron scattering,” Phys. Rev. Lett. 108 , 112502 (2012) .
  • J. Piekarewicz and F. J. Fattoyev, “Neutron-rich matter in heaven and on Earth,” Phys. Today 72 , 30 (2019) .
  • B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration), “GW170817: Observation of gravitational waves from a binary neutron star inspiral,” Phys. Rev. Lett. 119 , 161101 (2017) .
  • B. T. Reed et al. , “Implications of PREX-II on the equation of state of neutron-rich matter,” Phys. Rev. Lett. 126 , 172503 (2021) .
  • G. Raaijmakers et al. , “Constraining the dense matter equation of state with joint analysis of NICER and LIGO/Virgo measurements,” Astrophys. J. Lett. 893 , L21 (2020) .
  • D. Akimov et al. (Coherent Collaboration), “Observation of coherent elastic neutrino-nucleus scattering,” Science 357 , 1123 (2017) .
  • D. Aristizabal Sierra et al. , “Impact of form factor uncertainties on interpretations of coherent elastic neutrino-nucleus scattering data,” J. High Energy Phys. 6 , 141 (2019) .
  • J. Mammei et al. (CREX Collaboration), Proposal to Jefferson Lab PAC 40, CREX: Parity-violating measurement of the weak charge distribution of Ca to 0.02 fm accuracy (unpublished).
  • D. Becker et al. , “The P2 experiment,” Eur. Phys. J. A 54 , 208 (2018) .
  • The 2015 Nuclear Advisory Committee, “ Reaching for the horizon: The 2015 long range plan for nuclear science ".

About the Author

Image of Kate Scholberg

Kate Scholberg is the Arts and Sciences Distinguished Professor of Physics and Bass Fellow at Duke University, North Carolina. She received a Ph.D. in Physics from the California Institute of Technology in 1997. She is currently a member of the Super-Kamiokande, T2K, and Deep Underground Neutrino Experiment collaborations. She is spokesperson of COHERENT, a neutrino-scattering experiment at the Spallation Neutron Source at Oak Ridge National Laboratory, Tennessee. Her research primarily focuses on the physics of neutrinos, and it has broad intersections with particle physics, astrophysics, and nuclear physics.

Accurate Determination of the Neutron Skin Thickness of Pb 208 through Parity-Violation in Electron Scattering

D. Adhikari et al. (PREX Collaboration)

Phys. Rev. Lett. 126 , 172502 (2021)

Published April 27, 2021

Subject Areas

Related articles.

The Solar System as a Black Hole Detector

The Solar System as a Black Hole Detector

An asteroid-mass primordial black hole flying near a planet could perturb the planet’s orbit by a detectable amount. Read More »

Signatures of Gravitational Atoms from Black Hole Mergers

Signatures of Gravitational Atoms from Black Hole Mergers

Gravitational-wave signals from black hole mergers could reveal the presence of “gravitational atoms”—black holes surrounded by clouds of axions or other light bosons. Read More »

Gamma-Ray Burst Tightens Constraints on Quantum Gravity

Gamma-Ray Burst Tightens Constraints on Quantum Gravity

An analysis of the brightest gamma-ray burst ever observed reveals no difference in the propagation speed of different frequencies of light—placing some of the tightest constraints on certain violations of general relativity. Read More »

Sign up to receive weekly email alerts from Physics Magazine .

John Dalton

John Dalton

(1766-1844)

Who Was John Dalton?

During John Dalton's early career, he identified the hereditary nature of red-green color blindness. In 1803 he revealed the concept of Dalton’s Law of Partial Pressures. Also in the 1800s, he was the first scientist to explain the behavior of atoms in terms of the measurement of weight.

Early Life and Career

Dalton was born in Eaglesfield, England, on September 6, 1766, to a Quaker family. He had two surviving siblings. Both he and his brother were born color-blind. Dalton's father earned a modest income as a handloom weaver. As a child, Dalton longed for formal education, but his family was very poor. It was clear that he would need to help out with the family finances from a young age.

After attending a Quaker school in his village in Cumberland, when Dalton was just 12 years old he started teaching there. When he was 14, he spent a year working as a farmhand but decided to return to teaching — this time as an assistant at a Quaker boarding school in Kendal. Within four years, the shy young man was made principal of the school. He remained there until 1793, at which time he became a math and philosophy tutor at the New College in Manchester.

While at New College, Dalton joined the Manchester Literary and Philosophical Society. Membership granted Dalton access to laboratory facilities. For one of his first research projects, Dalton pursued his avid interest in meteorology. He started keeping daily logs of the weather, paying special attention to details such as wind velocity and barometric pressure—a habit Dalton would continue all of his life. His research findings on atmospheric pressure were published in his first book, Meteorological Findings , the year he arrived in Manchester.

During his early career as a scientist, Dalton also researched color blindness—a topic with which he was familiar through firsthand experience. Since the condition had affected both him and his brother since birth, Dalton theorized that it must be hereditary. He proved his theory to be true when genetic analysis of his own eye tissue revealed that he was missing the photoreceptor for perceiving the color green. As a result of his contributions to the understanding of red-green color blindness, the condition is still often referred to as "Daltonism."

Dalton's Law

Dalton's interest in atmospheric pressures eventually led him to a closer examination of gases. While studying the nature and chemical makeup of air in the early 1800s, Dalton learned that it was not a chemical solvent, as other scientists had believed. Instead, it was a mechanical system composed of small individual particles that used pressure applied by each gas independently.

Dalton's experiments on gases led to his discovery that the total pressure of a mixture of gases amounted to the sum of the partial pressures that each individual gas exerted while occupying the same space. In 1803 this scientific principle officially came to be known as Dalton's Law of Partial Pressures. Dalton's Law primarily applies to ideal gases rather than real gases, due to the elasticity and low particle volume of molecules in ideal gases. Chemist Humphry Davy was skeptical about Dalton's Law until Dalton explained that the repelling forces previously believed to create pressure only acted between atoms of the same sort and that the atoms within a mixture varied in weight and complexity.

The principle of Dalton's Law can be demonstrated using a simple experiment involving a glass bottle and large bowl of water. When the bottle is submerged under water, the water it contains is displaced, but the bottle isn't empty; it's filled with the invisible gas hydrogen instead. The amount of pressure exerted by the hydrogen can be identified using a chart that lists the pressure of water vapors at different temperatures, also thanks to Dalton's discoveries. This knowledge has many useful practical applications today. For instance, scuba divers use Dalton's principles to gauge how pressure levels at different depths of the ocean will affect the air and nitrogen in their tanks.

During the early 1800s, Dalton also postulated a law of thermal expansion that illustrated the heating and cooling reaction of gases to expansion and compression. He garnered international fame for his additional study using a crudely fashioned dew point hygrometer to determine how temperature impacts the level of atmospheric water vapor.

Atomic Theory

Dalton's fascination with gases gradually led him to formally assert that every form of matter (whether solid, liquid or gas) was also made up of small individual particles. He referred to the Greek philosopher Democritus of Abdera's more abstract theory of matter, which had centuries ago fallen out of fashion, and borrowed the term "atomos" or "atoms" to label the particles. In an article he wrote for the Manchester Literary and Philosophical Society in 1803, Dalton created the first chart of atomic weights.

Seeking to expand on his theory, he readdressed the subject of atomic weight in his book A New System of Chemical Philosophy , published in 1808. In A New System of Chemical Philosophy , Dalton introduced his belief that atoms of different elements could be universally distinguished based on their varying atomic weights. In so doing, he became the first scientist to explain the behavior of atoms in terms of the measurement of weight. He also uncovered the fact that atoms couldn't be created or destroyed.

Dalton's theory additionally examined the compositions of compounds, explaining that the tiny particles (atoms) in a compound were compound atoms. Twenty years later, chemist Amedeo Avogadro would further detail the difference between atoms and compound atoms.

In A New System of Chemical Philosophy , Dalton also wrote about his experiments proving that atoms consistently combine in simple ratios. What that meant was that the molecules of an element are always made up of the same proportions, with the exception of water molecules.

In 1810 Dalton published an appendix to A New System of Chemical Philosophy . In it he elaborated on some of the practical details of his theory: that the atoms within a given element are all exactly the same size and weight, while the atoms of different elements look—and are—different from one other. Dalton eventually composed a table listing the atomic weights of all known elements.

His atomic theories were quickly adopted by the scientific community at large with few objections. "Dalton made atoms scientifically useful," asserted Rajkumari Williamson Jones, a science historian at the University of Manchester Institute of Science and Technology. Nobel Laureate Professor Sir Harry Kroto, noted for co-discovering spherical carbon fullerenes, identified the revolutionary impact of Dalton's discoveries on the field of chemistry: "The crucial step was to write down elements in terms of their atoms...I don't know how they could do chemistry beforehand, it didn't make any sense."

From 1817 to the day he died, Dalton served as president of the Manchester Literary and Philosophical Society, the organization that first granted him access to a laboratory. A practitioner of Quaker modesty, he resisted public recognition; in 1822 he turned down elected membership to the Royal Society. In 1832 he did, however, begrudgingly accept an honorary Doctorate of Science degree from the prestigious Oxford University. Ironically, his graduation gown was red, a color he could not see. Fortunately for him, his color blindness was a convenient excuse for him to override the Quaker rule forbidding its subscribers to wear red.

In 1833 the government granted him a pension, which was doubled in 1836. Dalton was offered another degree, this time a Doctorate of Laws, by Edinburgh University in 1834. As if those honors were insufficient tribute to the revolutionary chemist, in London, a statue was erected in Dalton's honor--also in 1834. "Dalton was very much an icon for Manchester," said Rajkumari Williams Jones. "He is probably the only scientist who got a statue in his lifetime."

In his later life, Dalton continued to teach and lecture at universities throughout the United Kingdom, although it is said that the scientist was an awkward lecturer with a gruff and jarring voice. Throughout his lifetime, Dalton managed to maintain his nearly impeccable reputation as a devout Quaker. He lived a humble, uncomplicated life focusing on his fascination with science, and never married.

In 1837 Dalton had a stroke. He had trouble with his speech for the next year.

Death and Legacy

After suffering a second stroke, Dalton died quietly on the evening of July 26, 1844, at his home in Manchester, England. He was provided a civic funeral and granted full honors. A reported 40,000 people attended the procession, honoring his contributions to science, manufacturing and the nation's commerce.

By finding a way to "weigh atoms," John Dalton's research not only changed the face of chemistry but also initiated its progression into a modern science. The splitting of the atom in the 20th century could most likely not have been accomplished without Dalton laying the foundation of knowledge about the atomic makeup of simple and complex molecules. Dalton's discoveries also allowed for the cost-efficient manufacturing of chemical compounds, since they essentially give manufacturers a recipe for determining the correct chemical proportions in a given compound.

The majority of conclusions that made up Dalton's atomic theory still stand today.

"Now with nanotechnology, atoms are the centerpiece," said Nottingham University Professor of Chemistry David Garner. "Atoms are manipulated directly to make new medicines, semiconductors and plastics." He went on to further explain, "He gave us the first understanding of the nature of materials. Now we can design molecules with a pretty good idea of their properties."

In 2003, on the bicentennial of Dalton's public announcement of his atomic theory, the Manchester Museum held a tribute to the man, his life and his groundbreaking scientific discoveries.

QUICK FACTS

  • Name: John Dalton
  • Birth Year: 1766
  • Birth date: September 6, 1766
  • Birth City: Eaglesfield
  • Birth Country: United Kingdom
  • Gender: Male
  • Best Known For: Chemist John Dalton is credited with pioneering modern atomic theory. He was also the first to study color blindness.
  • Journalism and Nonfiction
  • Science and Medicine
  • Education and Academia
  • Astrological Sign: Virgo
  • John Fletcher's Quaker grammar school
  • Death Year: 1844
  • Death date: July 26, 1844
  • Death City: Manchester
  • Death Country: United Kingdom

We strive for accuracy and fairness.If you see something that doesn't look right, contact us !

CITATION INFORMATION

  • Article Title: John Dalton Biography
  • Author: Biography.com Editors
  • Website Name: The Biography.com website
  • Url: https://www.biography.com/scientists/john-dalton
  • Access Date:
  • Publisher: A&E; Television Networks
  • Last Updated: May 21, 2021
  • Original Published Date: April 2, 2014
  • Berzelius' symbols are horrifying. A young student in chemistry might as soon learn Hebrew as make himself acquainted with them.
  • We might as well attempt to introduce a new planet into the solar system, or to annihilate one already in existence, as to create or destroy a particle of hydrogen.
  • The principal failing in [Sir Humphrey Davy's] character as a philosopher is that he does not smoke.
  • I can now enter the lecture room with as little emotion nearly as I can smoke a pipe with you on Sunday or Wednesday evenings.
  • Matter, though divisible in an extreme degree, is nevertheless not infinitely divisible. That is, there must be some point beyond which we cannot go in the division of matter... I have chosen the word 'atom' to signify these ultimate particles.
  • Will it not be thought remarkable that in 1836 the British chemists are ignorant whether attraction, repulsion or indifference is marked when a mixture of any proportions of azote and oxygen are made.
  • In short, [London] is a most surprising place, and worth one's while to see once; but the most disagreeable place on earth for one of a contemplative turn to reside in constantly.
  • To ascertain the exact quantity of water in a given quantity of air is, I presume, an object not yet fully attained.
  • The cause of rain is now, I consider, no longer an object of doubt.

preview for Biography Scientists & Inventors Playlist

Famous British People

alan cumming

Alan Cumming

olivia colman photo

Olivia Colman

king henry viii

Richard III

a book opened to its title page that includes a drawn portrait of william shakespeare on the left side and additional details about the book, including its name, on the right side

20 Shakespeare Quotes

painting of william shakespeare

William Shakespeare

andy murray smiles at the camera while holding a silver bowl trophy, he wears an orange t shirt and leans against a tennis net

Andy Murray

stephen hawking

Stephen Hawking

gordon ramsay stands in his chef jacket and looks at the camera, he hands are clasped in front of him

Gordon Ramsay

kiefer sutherland smiles at the camera, he wears black glasses, a black suit jacket and a black collared button up shirt

Kiefer Sutherland

zayn malik photo

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 September 2024

Improving rigor and reproducibility in western blot experiments with the blotRig analysis

  • Cleopa Omondi 1 ,
  • Austin Chou 1 ,
  • Kenneth A. Fond 1 ,
  • Kazuhito Morioka 1 ,
  • Nadine R. Joseph 1 ,
  • Jeffrey A. Sacramento 1 ,
  • Emma Iorio 1 ,
  • Abel Torres-Espin 1 , 3 , 4 ,
  • Hannah L. Radabaugh 1 ,
  • Jacob A. Davis 1 ,
  • Jason H. Gumbel 1 ,
  • J. Russell Huie 1 , 2 &
  • Adam R. Ferguson 1 , 2  

Scientific Reports volume  14 , Article number:  21644 ( 2024 ) Cite this article

Metrics details

  • Biochemistry
  • Biological techniques
  • Computational biology and bioinformatics
  • Neuroscience

Western blot is a popular biomolecular analysis method for measuring the relative quantities of independent proteins in complex biological samples. However, variability in quantitative western blot data analysis poses a challenge in designing reproducible experiments. The lack of rigorous quantitative approaches in current western blot statistical methodology may result in irreproducible inferences. Here we describe best practices for the design and analysis of western blot experiments, with examples and demonstrations of how different analytical approaches can lead to widely varying outcomes. To facilitate best practices, we have developed the blotRig tool for designing and analyzing western blot experiments to improve their rigor and reproducibility. The blotRig application includes functions for counterbalancing experimental design by lane position, batch management across gels, and analytics with covariates and random effects.

Similar content being viewed by others

experiments with lead

A critical path to producing high quality, reproducible data from quantitative western blot experiments

experiments with lead

A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments

experiments with lead

OmicScope unravels systems-level insights from quantitative proteomics data

Introduction.

Proteomic technologies such as protein measurement with folin phenol reagent were first introduced by Lowry et al. in 1951 1 . The resulting qualitative data are typically confirmed by a second, independent method such as western blot (WB) 2 , 3 . The WB method, first described by Towbin et al. 4 and Burnette 5 in 1979 and 1981, respectively, uses specific antibody-antigen interactions to confirm the protein present in the sample mixture. Quantitative WB (qWB assay) is a technique to measure protein concentrations in biological samples with four main steps: (1) protein separation by size, (2) protein transfer to a solid support, (3) marking a target protein using proper primary and secondary antibodies for visualization, and (4) semi -quantitative analysis 6 . Importantly, qWB data is considered semi -quantitative because methods to control for experimental variability ultimately yield relative comparisons of protein levels rather than absolute protein concentrations 2 , 3 , 7 , 8 . Similarly, western blotting applying ECL (enhanced chemiluminescence) is considered a semi-quantitative method because it lacks cumulative luminescence linearity and offers limited quantitative reproducibility 9 . However, the emergence of highly sensitive fluorescent labeling techniques, which exhibit a wider quantifiable linear range, greater sensitivity, and improved stability when compared to the conventional ECL detection method, now permits the legitimate characterization of protein expression as linearly quantitative 10 . Current methodologies do not sufficiently account for diverse sources of variability, producing highly variable results between different laboratories and even within the same lab 11 , 12 , 13 . Indeed, qWB data exhibits more variability compared to other experimental techniques such as enzyme linked immunosorbent assay (ELISA) 14 . For example, results have shown that qWB can produce significant variability in detecting host cell proteins and lead to researchers missing or overestimating true biological effects 15 . This in turn results in publication of irreproducible qWB interpretations, which leads to loss of its credibility 13 . In the serious cases, qWB results may even provide clinical misdiagnosis 16 that could impact on a larger public health concern due to the prevalence of WB in biomedical research, such as diagnosis of SARS-CoV2 infection 17 .

The process of recognizing and accounting for variability in WB analyses will ultimately improve reproducibility between experiments. A growing body of studies has shown that this requires a fundamental shift in the experimental methodology across data acquisition, analysis, and interpretation to achieve precise and accurate results 2 , 3 , 11 , 12 , 13 .

Here we highlight experimental design practices that enable a statistics-driven approach to improve the reproducibility of qWBs. Specifically, we discuss major sources of variability in qWB including the non-linearity in antibody signal 2 , 3 ; imbalanced experimental design 13 ; lack of standardization in the treatment of technical replicates 3 , 18 ; and variability between protein loading, lanes, and blots 2 , 7 , 19 . To address these issues, we provide new comprehensive suggestions for quantitative evaluation of protein expression by combining linear range characterization for antibodies, appropriate counterbalancing during gel loading, running technical replicates across multiple gels, and by taking careful consideration of the analysis method. By applying these experimental practices, we can then account for more sources of variability by running analysis of covariance (ANCOVA) or generalized linear mixed models (LMM). Such approaches have been shown to successfully improve reproducibility compared to other methods 13 .

Good options for qWB protein bands analysis using free, downloadable tools are available for researchers. Amongst others, LI-COR Image Studio Lite can be used to measure the intensity of protein bands in western blots and calculate their relative abundance . Likewise, ThermoFisher ImageQuant Lite offers features such as the ability to perform background subtraction and normalization. However, to date, no specific tools are freely available to provide a map to counterbalance samples, which overcome imperfect uniform protein electrophoresis/transfer and perform statistical analysis. Here, we present blotRig, a tool for researchers with functionalities to counterbalance samples and perform statistical analysis.

To help improve WB rigor we developed the blotRig protocol and application harnessing a database of 6000 + western blots from N = 281 subjects (rats and mice) collected by multiple UCSF labs on core equipment. To demonstrate blotRig best practices in a real-world experiment, we carried out prospective multiplexed WB analysis of protein lysate from lumbar cord in rodent models of spinal cord injury (SCI) (N = 29 rats) in 2 groups (experimental group & control group). In order to show that these experimental suggestions could improve qWB reproducibility, we compared different statistical approaches to handling loading controls and technical replicates. Specifically, we applied two strategies to integrate loading controls: (i) normalizing the target protein levels by dividing by the loading control or (ii) treating the loading control as a covariate in a LMM. Additionally, we analyzed technical replicates in four ways: (1) assume each sample was only run once without replication, (2) treat each technical replicate as an independent sample, (3) use the mean of the three technical replicate values, and 4) treat the replicate as a random effect in a LMM. Altogether, we found that the statistical power of the experiment was significantly increased when we used loading control as a covariate with technical replicates as a random effect during analysis. In addition, the effect size was increased, and the p-value of our analysis decreased when using this LMM, suggesting the potential for greater sensitivity in our WB experiment when using this approach 20 . Through rigorous experimental design and statistical analysis we show that we can account for greater variability in the data and more clearly identify underlying biological effects.

Materials and methods

All experiments protocol were approved by the University Laboratory Animal Care Committee at University of California, San Francisco (UCSF, CA, USA) and followed the animal guidelines of the National Institutes of Health Guide for the Care and Use of Laboratory animals (National Research Council (US) Committee for the Update of the Guide for the Care and Use of Laboratory Animals, 2011). We followed The ARRIVE guidelines (Animal Research: Reporting In Vivo Experiments) to describe our in vivo experiments.

Male Simonsen Long Evans rats (188–385 g; Gilroy (Santa Clara, CA, USA), (N = 29) aged 3 weeks were housed under standard conditions with a 12-h light–dark cycle (6:30 am to 6:30 pm) and were given food and water ad libitum. The animals were housed mostly in pairs in 30 × 30 × 19-cm isolator cages with solid floors covered with a 3 cm layer of wood chip bedding. The experimenters were blind to the identity of treatments and experimental conditions, and all experiments were designed to minimize suffering and limit the number of animals required.

Anesthesia and surgery

We performed non-survival spinal cord injury and spared nerve injury surgeries on animals. Specifically, 3 week old female rats were anesthetized with continuous inhalation of isoflurane (1–5% mg/kg) while on oxygen (0.6–1 mg/kg) in accordance with the IACUC surgical and anesthesia guidelines. Preoperative 0.5% lidocaine local infiltration was applied once at surgical site, avoiding injection into muscle. Fur over the T7–T9 thoracic level was shaved. The dorsal skin was aseptically prepared with surgical iodine or chlorhexidine and 70% ethanol. A small longitudinal incision was made along the spine through the skin, fascia, and muscle to expose the T7-T9 vertebrae. Animals undergoing sham procedure did not undergo laminectomy and immediately proceeded to wound closure. Overlying muscle and subcutaneous tissue was sutured closed using an absorbable suture in a layered fashion. External skin was reinforced using monofilament suture or tissue glue as needed. Animals were euthanized after 30 min to extract spinal cord tissue through fluid expulsion.

Experimental methodology

In accordance with established quality standards for preclinical neurological research 21 , experimenters were kept blind to experimental group conditions throughout the entire study. Western blot loading order was determined a priori by a third-party coder, who ensured that a representative sample from each condition was included on each gel in a randomized block design. The number of subjects per condition was kept consistent across groups for each experiment to ensure that proper counterbalancing could be achieved across independent western runs. All representative western images presented in the figures represent lanes from the same gel. Sometimes, the analytical comparisons of interest were not available on adjacent lanes even though they come from the same gel because of our randomized counterbalancing procedure.

  • Western blot

The example western blot data used in this paper are taken from a model of spared nerve injury in animals with spinal cord injury. The nerve injury model used is based on models from pain literature 22 , where two of the three branches of the sciatic nerve are transected, sparing the sural nerve (SNI) 23 . Two surgeons perform the procedure simultaneously, with injuries occurring 5 min apart. The spinal cord of animals was obtained based on fluid expulsion model 24 and a 1 cm section of the lumbar region was excised at the lumbar enlargement section. The tissue was then preserved in a -80 degree freezer until it was needed for an experiment, at which point it was thawed and used to run a Western blot. We conducted a Western blot analysis on 29 samples from animals using standard biochemical methods. We measured the protein levels of the AMPA receptor subunit GluA2 and used beta-actin as a loading control. The data from these experiments was then aggregated and used for statistical analysis.

Protein assay

We assayed sample protein concentration using a bicinchoninic acid (BCA assay (Pierce) for reliable quantification of total protein using a plate reader (Tecan; GeNios) with triplicate samples (technical replicates) detected against a Bradford Assay (BSA) standard curve. Technical replicates are multiple measurements that are performed under the same conditions in order to quantify and correct for technical variability and improve the accuracy and precision of the results (48). We ran the same WB loading scheme three times (technical replicates of the entire gel) and measured the protein levels of AMPA receptors.

Polyacrylamide gel electrophoresis and multiplexed near-infrared immunoblotting

The approach involved performing serial 1:2 dilutions with cold Laemmli sample buffer in room temperature; 15 μg of total protein per sample was loaded into separate lanes on a precast 10–20% electrophoresis gel (Tris–HCl polyacrylamide, BioRad) to establish linear range (Fig.  1 ). The blotRig software helps counterbalance sample positions across the gel by treatment condition. (Fig.  2 ). A kaleidoscope ladder was loaded on the first lane of each gel to confirm molecular weight (Fig.  2 ). The gel was electrophoresed for 30 min at 200 V in SDS buffer (25 mm Tris, 192 mm glycine, 0.1% SDS, pH 8.3; BioRad). Protein was transferred to a nitrocellulose membrane in cold transfer buffer (25 mm Tris, 192 mm glycine, 20% ethanol, pH 8.3). Membrane transfer was confirmed using Ponceau S stain (67) followed by a quick rinse and blocking in Odyssey blocking buffer (Li-Cor) containing Tween-20.

figure 1

Determining linear range of antibodies to optimize parametric analysis of Western blot data. When small or large protein concentrations are loaded, there is often a possibility that their representation on western blot band density may become non-linear. If there is a disconnect between the observed and expected protein concentrations, results may be inaccurate. Thus determining the linear range wherein, a one-unit increase in protein is reflected in a linear increase in band density for each western blot antibody is a crucial initial step to ensure confidence in reproducibility of the linear models commonly applied to western blot data analysis.

figure 2

Counterbalancing to reduce bias. ( A) Experimental design. A simple hypothetical experimental design for illustrating counterbalancing. Two experimental groups (Wild Type vs Transgenic), with two treatments (Drug vs Vehicle) analyzed within each individual. This 2 (Experimental Condition) by 2 (Tissue Area) design yields four groups. ( B) Counter-balanced Gel Loading. The goal of appropriate counterbalancing is to optimize the sequence in which samples are loaded such that groups are represented equally across the gel. Those with red X have with the experimental groups and treatment condition grouped in the same area of the gel, and thus variability across the gel may be conflated with group differences. In contrast, those with the green check are organized so that experimental condition and treatment condition are better placed to reduce the possibility of any single group being over-represented in a particular area of the gel.

The membrane was blocked for 1 h in Odyssey Blocking Buffer (Li-Cor) containing 0.1% Tween-20, followed by an overnight incubation in primary antibody solution at 4 °C. Membrane incubation was done in a primary antibody solution containing Odyssey blocking buffer, Tween-20, appropriate primary antibody receptor targeting1:2000 mouse PSD-95 (cat # MA1-046,Thermofisher), 1:200 rabbit GluA1 (cat # AB1504, Millipore), 1:200 rabbit GluA2 (cat # AB1766, Millipore), 1:200 rabbit pS831(cat # 04–823, Millipore), 1:200 p880 (cat#07–294, Millipore) or 1:1,500 mouse actin loading control (cat # 612,857, BD Transduction)]. Following incubation, the membrane was washed 4 × 5 min with Tris-buffered saline containing 0.1% Tween 20 (TTBS) and incubated in fluorescent-labeled secondary antibody (1:30 K LiCor IRdye appropriate goat anti-rabbit in Odyssey blocking buffer plus 0.2% Tween 20) for 1 h in the dark. Subsequent to 4 × 5 min washes in TTBS, followed by a 5 min wash in TBS.

Membrane incubation was used to detect the presence of a specific protein or antigen on a membrane. In this case, the membrane was incubated with a fluorescently labeled secondary antibody solution that was specifically tuned to the emission spectra of the laser lines used by the Li-Cor Odyssey quantitative near-infrared molecular imaging system instrument. This allows for specific detection of the protein of interest on the membrane. The sample is then imaged using an infrared imaging system that is optimized for detecting the specific wavelengths of light emitted by the fluorescent label. Additional rounds of incubation and imaging are performed to detect additional proteins using the multiplexing functionality of the Li-Cor instrument, with each round adding new bands at different molecular weight ranges. This allows for the detection of multiple proteins in the same sample, maximizing the proteomic detection.

Quantitative near-IR densitometric analysis

Using techniques optimized in the our lab 25 , 26 , we established near-infrared labeling and detection techniques (Odyssey Infrared Imaging System, Li-Cor) to quantify linear intensity detection of fluorescently labeled protein bands. The biochemistry is performed in a blinded, counterbalanced fashion, and three independent replications of the assay are run on different days 27 . Fluorescent Western blotting utilizes fluorescent-labeled secondary antibodies to detect the target protein, which allows for more sensitive and specific detection compared to chemiluminescence 11 , 28 , 29 . Additionally, fluorescence imaging allows multiple detection of a target protein and internal loading control in the same blot, which enables more accurate correction of sample-to-sample and lane-to-lane variation 11 , 30 , 31 . This provides a more accurate and reliable quantification of the target protein, making it a popular choice for quantitative analysis of WB data.

It is good practice for the pipetting experimenter to remain blind to experimental conditions during gel loading, transfer, and densitometric quantification. We achieved this using de-identified tube codes and a priori gel loading sequences that were developed by an outside experimenter using the method implemented in the blotRig software.

Statistical analyses

Statistical analyses were performed using the R statistical software. Our WB data was analyzed using parametric statistics. The WB was run using three independent replications and covariance corrected by beta-actin loading control, with replication statistically controlled as a random factor. Significance was assessed at p < 0.05 25 , 26 , 32 , 33 , 34 . We report estimated statistical power and standardized regression coefficient effect sizes in the results section.

All ANOVAs were run using the stats R package; standardized effect size was calculated using the parameters R package 35 . Linear mixed models were run using the lme4 R package. Observed power was calculated by Monte Carlo simulation (1000x) run on the fitted model (either ANOVA or LMM) using the simR package 36 . For the development of the blotRig interface, the R packages used included: shiny, tidyverse, DT, shinythemes, shinyjs, and sortable ) 37 , 38 , 39 , 40 , 41 , 42 . You can access the blotRig analysis software, which includes code for inputting experimental parameters for all Western blot analysis, through the following link: https://atpspin.shinyapps.io/BlotRig/.

Designing reproducible western blot experiments

Determining linear range for each primary antibody.

Most WB analyses assume semi -quantitatively that the relationship between qWB assay optical density data (i.e. western band signal) and protein abundance is linear 2 , 3 , 11 , 18 . Accordingly, most qWB analyses use statistical tests (t-test; ANOVA) that assume a linear effect. However, recent studies have shown that the relationship can potentially be highly non-linear 19 As Fig.  1 illustrates, the WB band signal can become non-linearly correlated with protein concentrations at low and high values. This may result in inaccurate quantification of relative target protein amount in the experiment and violates the assumptions for linear model which can lead to false inferences. To address the assumption of linearity, it is important to first determine the optimal linear range for each protein of interest so that one can be confident that a unit change in band density reflects a linear change in protein concentration. This enables an experimenter to accurately quantify the protein of interest and apply linear statistical methods appropriately for hypothesis testing.

Counterbalancing during experimental design

Counterbalancing is the practice of having each experimental condition represented on each gel and evenly distributing them to prevent overrepresentation of the same experimental groups in consecutive lanes. For example, imagine an experimental design in which we are studying two experimental groups (wild type and transgenic animals) and are also looking at two treatment conditions (Drug and Vehicle). The best way to determine the effects and interactions between our experimental and treatment groups would be to create a balanced factorial design. A factorial design is one in which all combinations of levels across factors are represented. For the current example, a balanced factorial design would produce four groups, covering each possible combination (Drug-treated Wild Type, Vehicle-treated Wild Type, Drug-Treated Transgenic and Vehicle-treated Transgenic) (Fig.  2 A). During WB gel loading, experimenters often distribute their samples unevenly such that certain experimental conditions may be missing on some gels or samples from the same experimental condition are loaded adjacently on a gel. This is problematic because we know that polyacrylamide gel electrophoresis (PAGE) gels are not perfectly uniform, reflecting a source of technical variability 43 ; in the worst case, if we have only loaded a single experimental group on a gel and found a significant effect of the group, we cannot conclude if the effect is due to the experimental condition or a technical problem of the gel. At minimum, experimenters should ensure that every group in a factorial design is represented on each gel to avoid confounding technical gel effects with experimental differences. If the number of combinations is too large to represent on a single gel because of the number of factors or the number of levels of the factors, then a smaller "fractional factorial" design will provide maximal counterbalancing to ensure unbiased estimates of all factor effects and the most important interactions.

In addition, experimenters can further counter technical variability by arranging experimental groups on each gel to ensure adequately counterbalanced design assuming the uniformed protein concentration and fluid volume of all samples. This importantly addresses the variability due to physical effects within an individual gel. In our example, this means alternating the tissue areas and experimental conditions as much as possible to minimize similar samples from being loaded next to one another (Fig.  2 B). By spreading the possibility of technical variability across all samples by counterbalancing across and within gels, we can mitigate potential technical effects that can bias our results. Proper counterbalancing also enables us to implement more rigorous statistical analysis to account for and remove more technical variability 25 , 26 , 32 , 33 . Overall, this will help to ensure that experimenters can find the same result in the future and improve reproducibility.

Technical replication

Technical replicates are used to measure the precision of an assay or method by repeating the measurement of the same sample multiple times. The results of these replicates can then be used to calculate the variability and error of the assay or method 13 . This is important to establish the reliability and accuracy of the results. Most experimenters acknowledge the importance of running technical replicates to avoid false positives and negatives due to technical error 13 . Even beyond extreme results, technical replicates can account for the differences in gel makeup, human variability in gel loading, and potential procedural discrepancies. In fact, most studies run at least duplicates; however, the experimental implementation of replicates (e.g., running replicates on the same gel or separate gels) as well as the statistical analysis of replicates (e.g., dropping “odd-man-out” or taking the mean or standard deviation) can differ greatly 44 , 45 . This experimental variability ultimately impedes our ability to meaningfully compare results. For experimenters to establish accuracy and advance reproducibility in WB experiments, it is important to implement standardized and rigorous protocols to handle technical replicates 11 , 13 . In doing so, we can further reduce the technical variability with statistical methods during analysis.

As underscored previously, we recommend that technical replicates are counterbalanced on separate gels to mitigate any possible gel effect. Additionally, by running triplicates, we can treat replicates as a random effect in a LMM during statistical analysis. Importantly, triplicates provide more values to measure the distribution of technical variance to ensure the robustness of the LMM than only running duplicates. This approach isolates and removes technical variance from biological variation which ultimately improves our sensitivity for true experimental effects 46 .

In the following demonstration of statistical methods, we replicated all WB analyses in triplicate with a randomized counterbalanced design. We then explore how the way in which technical replicates and loading controls are incorporated into analysis can have a significant impact on both the sensitivity of our results and the interpretation of the findings. An example mockup of a dataset illustrating the various ways in which western blot data are typically prepared for analysis can be found in Fig.  3 .

figure 3

Western Blot Gel and Replication Strategies. ( A) Illustration of Western Blot Gel. This depiction of a typical multiplexed western blot gel highlights the antibody-labeled target protein bands of interest (green/yellow) and housekeeping protein loading control that is always run and quantified in the same sample and lane as the target of interest. Total protein stain (fluorescent ponceau stain) is shown in red can can be used as an alternative loading control. Specific, quantification is typically executed on a single antibody-labeled channel for the target protein and housekeeping protein loading control (gray scale image). ( B) Balanced Factorial Technical Replicate Strategy. Here we show the western blot data for the first 3 subjects from an example dataset. In a balanced factorial design, an equal number of samples from all possible experimental groups are represented on each gel. This table shows the subject number, the technical replicate, experimental group, and the band quantifications for both the target protein and the loading control. A ratio of target protein and loading control is also calculated. ( C) Other Common Technical Replicate Strategies. In this example table are two of the other ways western blot data are typically formatted. Some experimenters choose to not include technical replicates, with only one sample from each subject quantified. In another replication strategy, technical replicates are averaged. Averaging may bias or skew the data. We recommend running technical replicates on separate gels or batches, and using gel/batch as a random factor when analyzing western blot data.

Statistical methodology to improve western blot analysis

Loading control as a covariate.

Most qWB assay studies use loading controls (either a housekeeping protein or total protein within lane) to ensure that there are no biases in total protein loaded in a particular lane 2 , 11 , 27 . The most common way that loading controls are used to account for variability between lanes is by normalizing the target protein expression values by dividing it by the loading control values (Fig.  3 ) , resulting in a ratio between target protein to loading control 2 , 47 , 48 . However, ratios may violate assumptions of common statistical test used to analyze qWB (e.g., t-test, ANOVA, etc.) 49 This ultimately hinders the ability to statistically account for the variance in qWB outcomes and have a reliable estimate of the statistics. An alternative approach to improve the parametric properties would be to include loading control values as a covariate—a variable that is not our experimental factors but that may affect the outcome of interest and presents a source of variance that we may account for 50 . For instance, we know the amount of protein loaded is a source of variability in WB quantification, so we can use the loading control as a covariate to adjust for that variance. In doing so, we extend the method of ANOVA into that of ANCOVA 51 . This approach accounts for the technical variability present between lanes while meeting the necessary assumptions for parametric statistics which helps curb bias and averts false discoveries.

Replication and subject as a random effect

Most WB studies use ANOVA, a test that allows comparison of the means of three or more independent samples, for quantitative analysis of WB data 49 . One of the assumptions in ANOVA is the independence of observations 49 . This is problematic because we often collect multiple observations from the same analytical unit, for example different tissue samples from a single subject, or technical replicates. As a result, those observations don’t qualify as independent and should be analyzed using models controlling for variability within units of observations (e.g., the animal) to mitigate inferential errors (false positives and negatives) 52 caused by what is known as pseudoreplication. This arises when the quantity of measured values or data points surpasses the number of actual replicates, and the statistical analysis treats all data points as independent, resulting in their full contribution to the final result 53 .

In addition, when conducting experiments, it is important to consider the randomness of the conditions being observed. Treating both subjects and conditions as fixed effects can lead to inaccurate p-values. Instead, subjects/ animals should be treated as random effects and the conditions should be considered as a sample from a larger population 54 . This is especially important when collecting data from different replicates or gels, as the separate technical replicate runs should be considered as random.

In Fig.  4 we use a simple experimental design comparing the difference in a target protein between two experimental groups to demonstrate four of the most common ways researchers tend to analyze western blot data: (1) running each sample once without replication, (2) treating each technical replicate as an independent sample, (3) taking the mean of technical replicate values, and (4) treating subject and replication as a random effect (Fig.  4 ). We then tested how effect size, power, and p value are affected by each of these strategies to get a sense of how much these estimates vary between analyses. For each of these strategies, we also tested the difference between using the ratio of target protein to loading controls versus using loading control as a statistical covariate. For further exploration of the way these data are prepared and analyzed, see the data workup in Supplementary Figs.  1 and 2 .

figure 4

Effect of different replication and loading control strategies on statistical outcomes. Eight possible strategies are shown, representing the most common ways in which replication and loading controls are treated in a typical Western blot analysis. Four replication strategies: either no replication at all, 3 technical replicate gels treated as independent, mean of three replicates, or replicate treated as a random effect in a linear mixed model. These are crossed with two loading control strategies: either target protein is divided by loading control, or loading control is treated as a covariate in a linear mixed model. ( A) Effect Size: Standardized effect size coefficient is generally improved when loading control is treated as a covariate, compared to using a ratio of the target protein and loading control values. ( B) Power: By treating each replication as independent the statistical power is increased (due to the inaccurate assumption that technical replicates are not related, thus artificially tripling the n). Conversely, including the variability inherent in technical replicates as a part of the statistical model, we work to identify and account for a major source of variability, thus improving power in a more appropriate way. ( C) P value: As expected, when each replication is inaccurately treated as independent the p value is low (due to artificially inflated n). We found that using the mean of replications and loading controls as covariates also resulted in a p value below 0.05. The smallest p value was found when including replication as a random factor. Across each of these statistical measures, only when replication is included as a random factor and loading control as a covariate do we see a strong effect size, high power, and low p value.

In the first scenario, we imagined that no technical replication was run at all (by using only the first replication). With this strategy, we found that standardized effect size is weak, power is low, and the p value was high (Fig.  4 ). Second, we demonstrate how analytical output would be different if we did run three technical replicates, but treated each as independent. As discussed above, this strategy does not take into account the fact that each sample is being run three times, and consequently the overall n of your experiment is artificially tripled! As one might expect, observed power is quite high, and our p value is low (< 0.05). Power is increased by an increase in sample size, so it is not surprising that the power is much higher if we erroneously report that we have a 3X larger sample size (i.e., pseudoreplication) 53 . In this case, the observed power is inflated and an artifact of inappropriate statistics, and the probability of a false positive is considerably increased with respect to the expected 5%.

So, what would be a more appropriate way to handle technical replicates? One method that researchers often use is to take the mean of their technical replicates. This does ensure that we are not artificially inflating our sample size, which is certainly an improvement over the previous strategy. With this strategy, we do find that our p value is less than 0.05 (when loading control is treated as a covariate). But we also see that our power is still low. We have effectively taken our replicates into account by collapsing across them within each sample, but this can be dangerous. If there is wide variation across replicates of a particular sample, then taking the mean of three replicates could produce an inaccurate estimate of the ‘true’ sample value. Ideally, we want to find a solution where instead of collapsing this variation, we add it to our statistical model so that we can better understand what amount of variation is randomly coming from within technical replicates, and in turn what amount of variation is actually due to potential differences in our experimental groups.

To achieve this, we need to model both the fixed effect of all groups in a full factorial design, and the random effect of replication across western blot gels. When we use both fixed and random effects, this is referred to as a linear mixed model (LMM). When using this strategy, we find that our effect size remains strong, and our p value is low. But importantly, we now have strong observed power (Fig.  4 ). This suggests that we can achieve greater sensitivity in our WB experiment when using this approach . Specifically, if we implement careful counterbalancing while designing our experiments, then we can use the variability between gels to our advantage during analysis using linear mixed effects model 55 .

LMM is recommended because it takes into account both the multiple observations within a single subject/animal in a given condition and differences across subjects observed in multiple conditions. This reduces chances of inaccurate p-values and improves reliability 56 . Further, treating both subjects and replication as random effects generalizes the results to the population of subjects and also to the population of conditions 57 .

Real world application of blotRig software for western blot experimental design, technical replication, and statistical analysis

We have designed a user interface that is designed to facilitate appropriate counterbalancing and technical replication for western blot experimental design. The ‘blotRig’ application is run through RStudio, and can be found here: https://atpspin.shinyapps.io/BlotRig/ Upon starting the blotRig application, the user is prompted to upload a comma separated values (CSV) spreadsheet. This spreadsheet should include separate columns for subject ID and experimental group. The user is then prompted to enter the total number of lanes that are available on their particular western blot gel apparatus. The blotRig software will first run a quality check to confirm that each subject ID (unique sample or subject) is only found in one experimental group. If duplicates are found, a warning will be shown that specifies which subjects are repeated across groups. If no errors are found, a centered gel map will be generated that illustrates the western blot gel lanes into which each subject should be loaded (Fig.  5 A). The decision for each lane loading is based on two main principles outlined above: (1) each western blot gel should hold a representative sample of each experimental group (2) samples from the same experimental group are not loaded in adjacent lanes whenever possible. This ensures that proper counterbalancing is achieved so that we can limit the chances that the inherent variability within and across western blot gels is confounded with the experimental groups that we are interested in experimentally testing.

figure 5

Example of the blotRig Gel Creator interface. ( A ) Illustration of the blotRig interface. User has entered their sample IDs, experimental groups, and the number of lanes per western blot gel. ( B) The blotRig system then creates a counterbalanced gel map that ensures each gel contains a representative from each experimental group. This illustration shows the exact lane for each gel in which each sample should be run.

Once the gel map has been generated, the user can then select to export this gel map to a CSV spreadsheet. This sheet is designed to clearly show which gel each sample is on, which lane on each gel a sample is found, what experimental group each sample belongs to, and importantly, a repetition of each of these values for three technical replicates (Fig.  5 B). User will also see columns for Target Protein and Loading Control. These are the cells where the user can then input their densitometry values upon completing their western blot runs. Once this spreadsheet is filled out, it is then ready to go for blotRig analysis.

To analyze western blot data, users can upload the completed template that was exported in the blotRig experimental design phase or their own CSV file under the ‘Analysis’ tab (Fig.  6 ). The blotRig software will first ask the user to identify which columns from the spreadsheet represent Subject/SampleID, Experimental Group, Protein Target, Loading Control, and Replication. The blotRig software will again run a quality check to confirm that there are no subject/sample IDs that are duplicated across experimental groups. If no errors are found, the data will then be ready to analyze. The blotRig analysis will then be run, using the principles discussed above. Specifically, a linear mixed-model runs using the lmer R package, with Experimental Group as a fixed effect, Loading Control as a covariate, and Replication (nested within Subject/Sample ID) as a random factor. Analytical output is then displayed, giving a variety of statistical results from the linear mixed model output table, including fixed and random effects and associated p values (Fig.  6 ). A bar graph of group means and 95% confidence interval error bars will also be generated, along with a summary of the group means, standard error of the mean, and upper/lower 95% confidence intervals. These outputs can be directly reported in the results sections of papers, improve the statistical rigor of published WB reports. In addition, since the entire pipeline is opensource, the blotRig code itself can be reported to support transparency and reproducibility.

figure 6

Workflow for running statistical analysis of replicate western blot data using blotRig. First, fill out spreadsheet with subject ID, experimental group assignment, number of technical replication, the densitometry values for your target proteins and loading controls. After saving this spreadsheet as a.csv file, the file can be uploaded to blotRig. Tell blotRig the exact names of each of your variables, then click ‘Run Analysis’. This will produce a statistical output using linear mixed model testing for group differences using loading control as a covariate and replication as a random effect. Bar graph with error bars and summary statistics can then be exported.

Although the western blot technique has proven to be a workhorse for biological research, the need to enhance its reproducibility is critical 13 , 19 , 27 . Current qWB assay methods are still lacking for reproducibly identifying true biological effects 13 . We provide a systematic approach to generate quantitative data from western blot experiments that incorporates key technical and statistical recommendations which minimize sources of error and variability throughout the western blot process. First, our study shows that experimenters can improve the reproducibility of western blots by applying the experimental recommendations of determining the linear range for each primary antibody, counterbalancing during experimental design, and running technical triplicates 13 , 27 . Furthermore, these experimental implementations allow for application of the statistical recommendations of incorporating loading controls as covariates and analyzing gel and subject as random effects 58 , 59 . Altogether, these enable more rigorous statistical analysis that accounts for more technical variability which can improve the effect size, observed power, and p-value of our experiments and ultimately better identify true biological effects.

Biomedical research has continued to rely on p-values for determining and reporting differences between experimental groups, despite calls to retire the p-value 60 . Power (sensitivity) calculations have also become increasingly common. In brief, p-values and the related alpha value are associated with Type I error rate—the probability of rejecting the null hypothesis (i.e., claiming there is an effect) when there is no true effect 61 . On the other hand, power effectively measures the probability of rejecting the null hypothesis (i.e. stating there is not effect) when there is indeed a true underlying effect—a concept that is closely related to reducing the Type II error rate 59 , 62 . Critically, empirical evidence estimates that the median statistical power of studies in neuroscience is between ∼ 8% and ∼ 31%, yet best practices suggest that an experimenter should aim to achieve a power of 80% with an alpha of 0.05 20 . By being underpowered, experiments are at higher likelihood of producing a false inference. If an underpowered experiment is seeking to reproduce a previous observation, the resulting false negative may throw into question the original findings and directly exacerbate the reproducibility crisis 59 . Even more alarmingly, a low power also increases the likelihood that a statistically significant result is actually a false positive due to small sample size problems 61 . In our analyses, we show that our technical and statistical recommendations lower the p-value (indicating that the observed relationship between variables is less likely to be due to chance) as well as observed power of our experiments. This translates into the ability to better avoid false negatives when there is a true effect as well as reduce the likelihood of false positives when there is not a true experimental effect, both of which will ultimately improve the reproducibility of qWB assay experiments.

Another useful component of statistical analyses that is not as commonly reported but is critically related to p-value and power is effect size. Effect size is a statistical measure that describes the magnitude of the difference between two groups in an experiment 63 . It is used to quantify the strength of the relationship between the variables being studied 63 . The estimated effect size is important because it answers the most frequent question that researchers ask: how big is the difference between experimental groups, or how strong is the relationship or association? 63 . The combination of standardized effect size, p-value and power reflect crucial experimental results that can be broadly understood and compared with findings from other studies 62 , thus improving comparability of qWB experiments 49 , 63 . In particular, studies with large effect sizes have more power: we are more likely to detect a true positive experimental effect and avoid the false negative if the underlying difference between experimental groups is large 46 . In some cases, the calculated effect size is greatly influenced by how sources of variance are handled during analysis 13 . Our results demonstrate that by reducing the residual variance (by modeling the random effect of replication) the estimated effect size of our experiment increases. This could mean that the magnitude of the difference between the groups in our experiment is larger than it was originally thought to be. This could be due to a variety of factors such as improving the experimental design, sample size, or the measurement of the variables 13 . Likewise, conducting a power analysis is an essential step in experimental design that should be done before collecting data to ensure that the study is adequately powered to detect an effect of a certain size 64 .

Increasingly, power analysis is becoming a requirement for publications and grant proposals 65 . This is because a study with low statistical power is more likely to produce false negative results, which means that the study may fail to detect a real effect that actually exists. This can lead to the rejection of true hypotheses, wasted resources, and potentially harmful conclusions. In brief, given an experimental effect size and variance, we can calculate the sample size needed to achieve an alpha of 0.05 and power of 0.8; an increased sample size reduces the standard error of mean (SEM), which is the measured spread of sample means and consequently increases the power of the experiment 66 . We have demonstrated that our experimental and statistical recommendations lead to a lower p value (Fig.  3 C) and effect size (Fig.  3 B) without changing the sample size. This may be of greatest interest to researchers: more rigorous analytics ultimately improves experimental sensitivity without relying solely on increasing the sample size.

Reducing the sample size of an experiment can be beneficial for several reasons, one of which is cost-effectiveness. A smaller sample size can lead to a reduction in the number of animals or other resources that are needed for the study, which can result in lower costs. Additionally, it can also save time and reduce the duration of the experiment, as fewer subjects need to be recruited, and the data collection process can be completed more quickly. However, it is important to note that reducing the sample size can also lead to decreased statistical power. As a result, reducing sample size too much can increase the risk of a type II error, failing to detect significance when there is a true effect 62 .Therefore, it is important to consider the trade-off between sample size and power when designing an experiment, and to use statistical techniques like power analysis to ensure that the sample size is sufficient to detect an effect of a certain size. Moreover, when using animals in research, it's always important to consider the ethical aspect and the 3Rs principles of reduction, refinement, and replacement 55 .

Despite our best efforts in creating a balanced, full factorial experimental design, there will always be random variation in biological experiments. Fixed effects such as experimental group differences are expected to be generalizable if the experiment is replicated. Random effects (such as gel variation) on the other hand are unpredictable across experiments. Western blot analyses are particularly susceptible to this random gel variation, as different values may be observed for technical replicates run on different gels. By using a linear mixed model paired with rigorous full factorial design, we can ensure that we account for as much of that random variation as possible. When we acknowledge, identify, and model random effects we enhance the possibility of discovering our fixed effect of experimental treatment, if one exists.

The linear mixed model framework discussed above assumes that our western blot outcome measures are on a linear scale. As described above, parametric work to identify the linear range of a protein of interest is critical for ensuring that the results of a LMM (or ANOVA and t-test) are accurate and interpretable. While we recommend using loading control (or total protein control) as a covariate in a linear mixed model, many bench researchers may prefer to use the within-lane loading control (or total protein) to normalize target protein values. It is important to consider that in doing so, one creates a ratio value that is multiplicative instead of linear. This property has the side effect of artificially distorting the variance. To account for this non-linearity, we recommend that one uses semi-parametric mixed models such as generalized estimating equations with a gamma distribution link function that appropriately represents ratio data.

There has been recent recognition that an appropriate study design can be achieved by balancing sample size (n), effect size, and power 31 . The experimental and statistical approach presented in this study provide insight into how more rigorous planning for western blot experimental design and corresponding statistical analysis without depending on p-values only can acquire precise data resulting in true biological effects. Using blotRig as a standardized, integrated western blot methodology, quantitative western blot may become highly reproducible, reliable, and a less controversial protein measurement technique 18 , 28 , 67 .

Study reporting

This study is reported in accordance with ARRIVE guidelines.

Supporting information

This article contains supporting information. You can access the blotRig analysis software, which includes code for inputting experimental parameters for all Western blot analysis, through the following link: https://atpspin.shinyapps.io/BlotRig/

Data availability

The datasets and computer code generated or used in this study are accessible in a public, open-access repository at https://doi.org/10.34945/F51C7B and https://github.com/ucsf-ferguson-lab/blotRig/ respectively.

Abbreviations

American association for accreditation of laboratory animal care

Animal research reporting of in vivo experiments

American veterinary medical association

Institutional animal care and use committee

Quantitative western blot

Enzyme linked immunosorbent assay

Severe acute respiratory syndrome coronavirus 2

Analysis of covariance

Analysis of variance

Spinal cord injury

Spared nerve injury

α-Amino-3-hydroxy-5-methyl-4-isoxazoleproprionic acid

Glutamate receptor 1

Glutamate receptor 2

Linear mixed models

Tris-buffered saline containing 0.1% Tween 20

Polyacrylamide gel electrophoresis

Standard error of mean

Lowry, O., Rosebrough, N., Farr, A. L. & Randall, R. Protein measurement with the Folin phenol reagent. J. Biol. Chem. 193 , 265–275. https://doi.org/10.1016/S0021-9258(19)52451-6 (1951).

PubMed   Google Scholar  

Aldridge, G. M., Podrebarac, D. M., Greenough, W. T. & Weiler, I. J. The use of total protein stains as loading controls: An alternative to high-abundance single protein controls in semi-quantitative immunoblotting. J. Neurosci. Methods 172 , 250–254. https://doi.org/10.1016/j.jneumeth.2008.05.00 (2008).

PubMed   PubMed Central   Google Scholar  

McDonough, A. A., Veiras, L. C., Minas, J. N. & Ralph, D. L. Considerations when quantitating protein abundance by immunoblot. Am. J. Physiol. Cell Physiol. 308 , C426-433. https://doi.org/10.1152/ajpcell.00400.2014 (2015).

Towbin, H., Staehelin, T. & Gordon, J. Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: Procedure and some applications. PNAS 76 , 4350–4354. https://doi.org/10.1073/pnas.76.9.4350 (1979).

ADS   PubMed   PubMed Central   Google Scholar  

Burnette, W. N. “Western blotting”: Electrophoretic transfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Anal. Biochem. 112 , 195–203. https://doi.org/10.1016/0003-2697(81)90281-5 (1981).

Mahmood, T. & Yang, P.-C. Western blot: Technique, theory, and trouble shooting. N. Am. J. Med. Sci. 4 , 429–434. https://doi.org/10.4103/1947-2714.100998 (2012).

Alegria-Schaffer, A., Lodge, A. & Vattem, K. Performing and optimizing Western blots with an emphasis on chemiluminescent detection. Methods Enzymol. 463 , 573–599. https://doi.org/10.1016/S0076-6879(09)63033-0 (2009).

Khoury, M. K., Parker, I. & Aswad, D. W. Acquisition of chemiluminescent signals from immunoblots with a digital SLR camera. Anal. Biochem. 397 , 129–131. https://doi.org/10.1016/j.ab.2009.09.041 (2010).

Zellner, M. et al. Fluorescence-based western blotting for quantitation of protein biomarkers in clinical samples. Electrophoresis 29 , 3621–3627. https://doi.org/10.1002/elps.200700935 (2008).

Gingrich, J. C., Davis, D. R. & Nguyen, Q. Multiplex detection and quantitation of proteins on western blots using fluorescent probes. Biotechniques 29 , 636–642. https://doi.org/10.2144/00293pf02 (2000).

Janes, K. A. An analysis of critical factors for quantitative immunoblotting. Sci. Signal 8 , rs2. https://doi.org/10.1126/scisignal.2005966 (2015).

Mollica, J. P., Oakhill, J. S., Lamb, G. D. & Murphy, R. M. Are genuine changes in protein expression being overlooked? Reassessing western blotting. Anal. Biochem. 386 , 270–275. https://doi.org/10.1016/j.ab.2008.12.029 (2009).

Pillai-Kastoori, L., Schutz-Geschwender, A. R. & Harford, J. A. A systematic approach to quantitative western blot analysis. Anal. Biochem. 593 , 113608. https://doi.org/10.1016/j.ab.2020.113608 (2020).

Aydin, S. A short history, principles, and types of ELISA, and our laboratory experience with peptide/protein analyses using ELISA. Peptides 72 , 4–15. https://doi.org/10.1016/j.peptides.2015.04.012 (2015).

Seisenberger, C. et al. Questioning coverage values determined by 2D western blots: A critical study on the characterization of anti-HCP ELISA reagents. Biotechnol. Bioeng. 118 , 1116–1126. https://doi.org/10.1002/bit.27635 (2021).

Edwards, V. M. & Mosley, J. W. Reproducibility in quality control of protein (western) immunoblot assay for antibodies to human immunodeficiency virus. Am. J. Clin. Pathol. 91 , 75–78. https://doi.org/10.1093/ajcp/91.1.75 (1989).

Matschke, J. et al. Neuropathology of patients with COVID-19 in Germany: A post-mortem case series. Lancet Neurol. 19 , 919–929. https://doi.org/10.1016/S1474-4422(20)30308-2 (2020).

Murphy, R. M. & Lamb, G. D. Important considerations for protein analyses using antibody based techniques: Down-sizing western blotting up-sizes outcomes. J. Physiol. 591 , 5823–5831. https://doi.org/10.1113/jphysiol.2013.263251 (2013).

Butler, T. A. J., Paul, J. W., Chan, E.-C., Smith, R. & Tolosa, J. M. Misleading westerns: Common quantification mistakes in western blot densitometry and proposed corrective measures. Biomed. Res. Int. 2019 , 5214821. https://doi.org/10.1155/2019/5214821 (2019).

Button, K. S. et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14 , 365–376. https://doi.org/10.1038/nrn3475 (2013).

Landis, S. C. et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490 , 187–191. https://doi.org/10.1038/nature11556 (2012).

Shields, S. D., Eckert, W. A. & Basbaum, A. I. Spared nerve injury model of neuropathic pain in the mouse: A behavioral and anatomic analysis. J. Pain 4 , 465–470. https://doi.org/10.1067/s1526-5900(03)00781-8 (2003).

Decosterd, I. & Woolf, C. Spared nerve injury: An animal model of persistent peripheral neuropathic pain. Pain 87 , 149–158. https://doi.org/10.1016/S0304-3959(00)00276-1 (2000).

Richner, M., Jager, S. B., Siupka, P. & Vaegter, C. B. Hydraulic extrusion of the spinal cord and isolation of dorsal root ganglia in rodents. J. Vis. Exp. https://doi.org/10.3791/55226 (2017).

Ferguson, A. R. et al. Cell death after spinal cord injury is exacerbated by rapid TNFα-induced trafficking of GluR2-lacking AMPARS to the plasma membrane. J Neurosci 28 , 11391–11400. https://doi.org/10.1523/JNEUROSCI.3708-08.2008 (2008).

Ferguson, A. R., Huie, J. R., Crown, E. D. & Grau, J. W. Central nociceptive sensitization vs. spinal cord training: Opposing forms of plasticity that dictate function after complete spinal cord injury. Front. Physiol. 3 , 1. https://doi.org/10.3389/fphys.2012.00396 (2012).

Google Scholar  

Taylor, S. C., Berkelman, T., Yadav, G. & Hammond, M. A defined methodology for reliable quantification of western blot data. Mol. Biotechnol. 55 , 217–226. https://doi.org/10.1007/s12033-013-9672-6 (2013).

Bakkenist, C. J. et al. A quasi-quantitative dual multiplexed immunoblot method to simultaneously analyze ATM and H2AX phosphorylation in human peripheral blood mononuclear cells. Oncoscience 2 , 542–554. https://doi.org/10.18632/oncoscience.162 (2015).

Wang, Y. V. et al. Quantitative analyses reveal the importance of regulated Hdmx degradation for p53 activation. Proc. Natl. Acad. Sci. USA 104 , 12365–12370. https://doi.org/10.1073/pnas.0701497104 (2007).

Bass, J. et al. An overview of technical considerations for western blotting applications to physiological research. Scand. J. Med. Sci. Sports 27 , 4–25. https://doi.org/10.1111/sms.12702 (2017).

Lazzeroni, L. C. & Ray, A. The cost of large numbers of hypothesis tests on power, effect size and sample size. Mol. Psychiatry 17 , 108–114. https://doi.org/10.1038/mp.2010.117 (2012).

Huie, J. R. et al. AMPA receptor phosphorylation and synaptic colocalization on motor neurons drive maladaptive plasticity below complete spinal cord injury. eNeuro https://doi.org/10.1523/ENEURO.0091-15.2015 (2015).

Stück, E. D. et al. Tumor necrosis factor alpha mediates GABAA receptor trafficking to the plasma membrane of spinal cord neurons in vivo. Neural Plast https://doi.org/10.1155/2012/261345 (2012).

Krzywinski, M. & Altman, N. Points of significance: Power and sample size. Nat. Method. 10 , 1139–1140. https://doi.org/10.1038/nmeth.2738 (2013).

R Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2021).

Green, P. & MacLeod C. J. “simr: An R package for power analysis of generalised linear mixed models by simulation.” Meth. Ecol. Evolut. 7 (4), 493–498. https://doi.org/10.1111/2041-210X.12504 , https://CRAN.R-project.org/package=simr (2016).

Attali, D. shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds. R package version 2.1.0, https://deanattali.com/shinyjs/ (2022).

Chang, W. et al. shiny: Web Application Framework for R. R package version 1.9.1.9000, https://github.com/rstudio/shiny , https://shiny.posit.co/ (2024).

Chang, W. shinythemes: Themes for Shiny. R package version 1.2.0, https://github.com/rstudio/shinythemes (2024).

de Vries, A., Schloerke, B., Russell, K. sortable: Drag-and-Drop in ‘shiny’ Apps with ‘SortableJS’. R package version 0.5.0, https://github.com/rstudio/sortable (2024).

Wickham, H. et al. Welcome to the tidyverse. JOSS 4 (43), 1686. https://doi.org/10.21105/joss.01686 (2019).

ADS   Google Scholar  

Xie, Y., Cheng, J., Tan, X. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.33.1, dt. https://github.com/rstudio/ (2024).

Krzywinski, M. & Altman, N. Points of significance: Analysis of variance and blocking. Nat Methods 11 , 699–700. https://doi.org/10.1038/nmeth.3005 (2014).

Heidebrecht, F., Heidebrecht, A., Schulz, I., Behrens, S.-E. & Bader, A. Improved semiquantitative western blot technique with increased quantification range. J. Immunol. Methods 345 , 40–48. https://doi.org/10.1016/j.jim.2009.03.018 (2009).

Huang, Y.-T. et al. Robust comparison of protein levels across tissues and throughout development using standardized quantitative western blotting. J. Vis. Exp. https://doi.org/10.3791/59438 (2019).

Krzywinski, M. & Altman, N. Points of view: Designing comparative experiments. Nat. Methods 11 , 597–598. https://doi.org/10.1038/nmeth.2974 (2014).

Thacker, J. S., Yeung, D. H., Staines, W. R. & Mielke, J. G. Total protein or high-abundance protein: Which offers the best loading control for western blotting?. Anal. Biochem. 496 , 76–78. https://doi.org/10.1016/j.ab.2015.11.022 (2016).

Zeng, L. et al. Direct blue 71 staining as a destaining-free alternative loading control method for western blotting. Electrophoresis 34 , 2234–2239. https://doi.org/10.1002/elps.201300140 (2013).

Jaeger, T. F. Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. J. Mem. Lang. 59 , 434–446. https://doi.org/10.1016/j.jml.2007.11.007 (2008).

Mefford, J. & Witte, J. S. The covariate’s dilemma. PLoS Genet. 8 , e1003096. https://doi.org/10.1371/journal.pgen.1003096 (2012).

Schneider, B. A., Avivi-Reich, M. & Mozuraitis, M. A cautionary note on the use of the analysis of covariance (ANCOVA) in classification designs with and without within-subject factors. Front. Psychol. 6 , 474. https://doi.org/10.3389/fpsyg.2015.00474 (2015).

Nieuwenhuis, S., Forstmann, B. U. & Wagenmakers, E.-J. Erroneous analyses of interactions in neuroscience: A problem of significance. Nat. Neurosci. 14 , 1105–1107. https://doi.org/10.1038/nn.2886 (2011).

Freeberg, T. M. & Lucas, J. R. Pseudoreplication is (still) a problem. J. Com. Psychol. 123 , 450–451. https://doi.org/10.1037/a0017031 (2009).

Judd, C. M., Westfall, J. & Kenny, D. A. Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. J. Pers. Soc. Psychol. 103 , 54–69. https://doi.org/10.1037/a0028347 (2012).

Lee, O. E. & Braun, T. M. Permutation tests for random effects in linear mixed models. Biometrics 68 , 486–493. https://doi.org/10.1111/j.1541-0420.2011.01675.x (2012).

MathSciNet   PubMed   Google Scholar  

Baayen, R. H., Davidson, D. J. & Bates, D. M. Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang 59 , 390–412. https://doi.org/10.1016/j.jml.2007.12.005 (2008).

Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J. Mem. Lang. https://doi.org/10.1016/j.jml.2012.11.001 (2013).

Blainey, P., Krzywinski, M. & Altman, N. Points of significance: Replication. Nat. Methods 11 , 879–880. https://doi.org/10.1038/nmeth.3091 (2014).

Drubin, D. G. Great science inspires us to tackle the issue of data reproducibility. Mol. Biol. Cell 26 , 3679–3680. https://doi.org/10.1091/mbc.E15-09-0643 (2015).

Amrhein, V., Greenland, S. & McShane, B. Scientists rise up against statistical significance. Nature 567 , 305–307. https://doi.org/10.1038/d41586-019-00857-9 (2019).

ADS   PubMed   Google Scholar  

Cohen, J. The earth is round (p <.05). Am. Psychol. 49 , 997–1003. https://doi.org/10.1037/0003-066X.49.12.997 (1994).

Ioannidis, J. P. A., Tarone, R. & McLaughlin, J. K. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology 22 , 450–456. https://doi.org/10.1097/EDE.0b013e31821b506e (2011).

Sullivan, G. M. & Feinn, R. Using effect size-or why the P value Is not enough. J. Grad. Med. Educ. 4 , 279–282. https://doi.org/10.4300/JGME-D-12-00156.1 (2012).

Brysbaert, M. & Stevens, M. Power analysis and effect size in mixed effects models: A tutorial. J. Cogn. 1 , 9. https://doi.org/10.5334/joc.10 (2018).

Kline, R. B. Beyond significance testing: Reforming data analysis methods in behavioral research. Am. Psychol. Associat . https://doi.org/10.1037/10693-000 (2024).

Rosner, Bernard (Bernard A.). Fundamentals of biostatistics. (Boston, Brooks/Cole, Cengage Learning, 2011).

Bromage, E., Carpenter, L., Kaattari, S. & Patterson, M. Quantification of coral heat shock proteins from individual coral polyps. Mar. Ecol. Progress Ser. 376 , 123–132 (2009).

Download references

Acknowledgements

The authors would like to thank Alexys Maliga Davis for data librarian services.

This work was supported by a National Institutes of Health/National Institute of Neurological Disorders and Stroke grant (R01NS088475) to A. R. F. NIH NINDS: R01NS122888, UH3NS106899, U24NS122732, US Veterans Affairs (VA): I01RX002245, I01RX002787, I50BX005878, Wings for Life Foundation, Craig H. Neilsen Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Correspondence and requests for materials should be addressed to A.R.F.

Author information

Authors and affiliations.

Weill Institute for Neurosciences, University of California, San Francisco, CA, USA

Cleopa Omondi, Austin Chou, Kenneth A. Fond, Kazuhito Morioka, Nadine R. Joseph, Jeffrey A. Sacramento, Emma Iorio, Abel Torres-Espin, Hannah L. Radabaugh, Jacob A. Davis, Jason H. Gumbel, J. Russell Huie & Adam R. Ferguson

San Francisco Veterans Affairs Medical Center, San Francisco, CA, USA

J. Russell Huie & Adam R. Ferguson

School of Public Health Sciences, Faculty of Health Sciences, University of Waterloo, Waterloo, ON, Canada

Abel Torres-Espin

Department of Physical Therapy, Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, AB, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

C.O: Writing-original draft preparation, Investigation, Validation, Data Curation, Visualization, Formal Analysis, Writing-Review & Editing; A. C: Formal analysis, Writing-Review & Editing; K. A. F: Software, Writing-Review & Editing; K. M: Methodology, Writing-Review & Editing; N. R. J: Writing—Review & Editing; J. A. S: Investigation, Project Administration, Writing—Review & Editing; E. I: Resources, Writing-Review & Editing; A.T.E: Software, Writing—Review & Editing; H. L. R: Software, Writing—Review & Editing; J. A. D: Investigation, Writing—Review & Editing; J. H. G: Investigation, Writing – Review & Editing; J. R. H: Conceptualization, Methodology, Validation, Formal Analysis, Investigation, Data Curation, Writing- Review & Editing, Visualization, Supervision; A. R. F:Conceptualization, Methodology, Validation, Formal Analysis, Resources, Investigation, Data Curation, Writing-Review & Editing, Visualization, Supervision, Project Administration, Funding Acquisition.

Corresponding authors

Correspondence to J. Russell Huie or Adam R. Ferguson .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary figure 1., supplementary figure 2., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Omondi, C., Chou, A., Fond, K.A. et al. Improving rigor and reproducibility in western blot experiments with the blotRig analysis. Sci Rep 14 , 21644 (2024). https://doi.org/10.1038/s41598-024-70096-0

Download citation

Received : 19 December 2023

Accepted : 13 August 2024

Published : 17 September 2024

DOI : https://doi.org/10.1038/s41598-024-70096-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Analytical chemistry
  • Biostatistics
  • Computational biology
  • Computational chemistry

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

experiments with lead

A woman closely examines the xray of a 2-year-old, which shows flecks of lead paint throughout the intestinal tract

This is what you need to know about lead and your health

From popular reusable water bottles to aviation fuel, lead is ubiquitous. Should we be worried?

Invisible to the naked eye, odorless, and nearly impossible to detect by taste—traces of lead are in the products we use, the beverages we drink, and the homes we live in. It even shows up in our reusable water bottles, like the lead found lining the bottom of Stanley cups —a controversial discovery that recently reignited consumer attention toward an age-old issue .  

Although natural sources like volcanic eruptions have marginally contributed to lead concentrations on the planet’s surface, the primary culprit behind the global lead pollution problem —which prematurely kills an estimated 5.5 million people every year—is human activity.  

“Natural levels of lead air pollution don’t really exist unless you’re under a volcano. The lead you breathe is manmade,” says Alexander More, a climate and health scientist at the University of Massachusetts and Harvard University who has led studies on the subject.  

After mining operations and industrial processes like lead smelters and waste incinerators, common sources of lead pollution include additives to gasoline and paints, as well as the production of batteries and utilities.  

“We don’t know what a society without lead in our soil, in our water, in our air looks like,” says More.

What is lead used in?  

Lead was one of the first metals humans ever extracted from ores thousands of years ago, and it's been used in a variety of ways ever since. Ancient coins, cosmetics, ceramics, and bullets were once made from the malleable metal. It was even used by ancient Romans to distribute water, ferment wine and sweeten food.  

The dangerous health risks associated with lead exposure may have been identified as early as the Roman Empire , and yet the world continued to rely on the heavy metal for everything from the pursuit of alchemy in the Middle Ages to gasoline additives in the twentieth century . By the time the latter rolled around, the U.S. had emerged as both the foremost producer and consumer of refined lead, depositing millions of tons of lead in the environment through the fuel used to power America’s vehicles.  

It wasn’t until the late twentieth century, shortly after Congress established the Clean Air Act, that the U.S. began to limit its lead use. In 1973, the Environmental Protection Agency implemented the first regulations to phase down the amount of lead in gasoline, but it would take almost half a century before leaded fuel for cars and trucks was banned from being sold anywhere in the world, according to the World Health Organization . Removing lead from gasoline resulted in significant declines of blood lead levels worldwide, including the U.S.  

( Is tap water safe to drink? Here's what you need to know. )

But the use of leaded gasoline in transportation fuel was never regulated for aircraft engines, the largest remaining source of lead emissions nationwide. Last October, the EPA deemed the continued use of leaded gasoline by some smaller airplanes a danger to public health.  

Unlike many other chemicals, lead does not biodegrade over time—which is partly why lead exposure is a serious environmental justice issue, according to Tomás Guilarte, a neurotoxicologist and professor at Florida International University.  

You May Also Like

experiments with lead

Tampons have lead in them—what does it mean for your health?

experiments with lead

What your biological age can reveal about your health

experiments with lead

E-bikes are good for the environment—but what about your health?

Low-income communities and populations of color face the highest levels of lead exposure nationwide, primarily because of the environments and homes they live in, many of which are located closer to highways or areas where the soil is highly contaminated due to previous dispersion of lead in gasoline, he notes.  

How does lead affect children?  

“There is no safe level of lead,” says Olivia Halabicky, an environmental health scientist at the University of Michigan who studies how early childhood lead exposure influences development.  

In addition to asking a doctor to test your lead blood levels, she recommends everyone test their water sources, homes, and even nearby soil. Consumer products found laden with lead, like food, jewelry, and children’s toys , are another point of concern. “We don't want people to be exposed to this at all,” Halabicky adds.

( Microplastics are in our bodies. How much do they harm us? )

Children in particular are most vulnerable to the detrimental impacts of lead because of how the toxin disproportionately affects brains that are still developing, says Guilarte, who researches the impacts of lead on the human brain.  

High levels of lead exposure can cause serious damage to a child’s brain and central nervous system, which can result in a coma, convulsions, and death . Children that survive severe lead poisoning may end up with lifelong intellectual impairments and behavioral disorders. Even low levels of exposure are known to reduce IQ and produce learning deficits, as well as poor academic performance.  

“Think about a store full of fine glass. You have vases and very expensive glassware,” Guilarte explains. “And all of a sudden, you let a bull in the door. That’s exactly what happens in the brain with lead.”

How else does lead harm the body?  

Lead doesn’t just harm the brain—researchers have also discovered that high levels of lead exposure can affect many other organs, like the heart.  

And it's not just elevated concentrations that can be detrimental.  

A 2018 study found that about 400,000 deaths in the U.S. can be attributable to “low-level” lead exposure annually—more than half from cardiovascular disease. Chronic exposure to low or moderate levels of lead is associated with an increased risk of cardiovascular disease, per a 2023 scientific statement by the American Heart Association.

Meanwhile, another 2022 study found that over 170 million American adults alive today—more than half of the population—were exposed to high lead levels in early childhood. Around 10 million Americans may have been exposed to levels that are seven times the current threshold of clinical concern.  

Tackling a problem as ubiquitous as this one, according to Guilarte, would require us to rethink how products are tested and how people are screened for exposure.  

“There's regulation in the United States that every child, before two years of age, should be tested [for lead]. And many, many states don't do it ,” he said. “There’s a lot more that needs to be done.”

Related Topics

  • MENTAL HEALTH
  • ENVIRONMENTAL JUSTICE

experiments with lead

Multiple COVID infections can lead to chronic health issues. Here’s what to know.

experiments with lead

It's not your life span you need to worry about. It's your health span.

experiments with lead

Do you really need 10,000 steps a day? Here’s what the science says.

experiments with lead

What can your DNA say about your risk of opioid addiction?

experiments with lead

Are you drinking water all wrong? Here’s what you need to know about hydrating.

  • Environment
  • Paid Content

History & Culture

  • History & Culture
  • Destination Guide
  • Terms of Use
  • Privacy Policy
  • Your US State Privacy Rights
  • Children's Online Privacy Policy
  • Interest-Based Ads
  • About Nielsen Measurement
  • Do Not Sell or Share My Personal Information
  • Nat Geo Home
  • Attend a Live Event
  • Book a Trip
  • Inspire Your Kids
  • Shop Nat Geo
  • Visit the D.C. Museum
  • Learn About Our Impact
  • Support Our Mission
  • Advertise With Us
  • Customer Service
  • Renew Subscription
  • Manage Your Subscription
  • Work at Nat Geo
  • Sign Up for Our Newsletters
  • Contribute to Protect the Planet

Copyright © 1996-2015 National Geographic Society Copyright © 2015-2024 National Geographic Partners, LLC. All rights reserved

A standardized framework to test event-based experiments

  • Original Manuscript
  • Open access
  • Published: 16 September 2024

Cite this article

You have full access to this open access article

experiments with lead

  • Alex Lepauvre   ORCID: orcid.org/0000-0002-4191-1578 1 , 2 ,
  • Rony Hirschhorn   ORCID: orcid.org/0000-0001-7710-5159 3 ,
  • Katarina Bendtz   ORCID: orcid.org/0000-0002-8262-3652 4 ,
  • Liad Mudrik   ORCID: orcid.org/0000-0003-3564-6445 3 , 5 , 7 &
  • Lucia Melloni   ORCID: orcid.org/0000-0001-8743-5071 1 , 6 , 7  

The replication crisis in experimental psychology and neuroscience has received much attention recently. This has led to wide acceptance of measures to improve scientific practices, such as preregistration and registered reports. Less effort has been devoted to performing and reporting the results of systematic tests of the functioning of the experimental setup itself. Yet, inaccuracies in the performance of the experimental setup may affect the results of a study, lead to replication failures, and importantly, impede the ability to integrate results across studies. Prompted by challenges we experienced when deploying studies across six laboratories collecting electroencephalography (EEG)/magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), and intracranial EEG (iEEG), here we describe a framework for both testing and reporting the performance of the experimental setup. In addition, 100 researchers were surveyed to provide a snapshot of current common practices and community standards concerning testing in published experiments’ setups. Most researchers reported testing their experimental setups. Almost none, however, published the tests performed or their results. Tests were diverse, targeting different aspects of the setup. Through simulations, we clearly demonstrate how even slight inaccuracies can impact the final results. We end with a standardized, open-source, step-by-step protocol for testing (visual) event-related experiments, shared via protocols.io. The protocol aims to provide researchers with a benchmark for future replications and insights into the research quality to help improve the reproducibility of results, accelerate multicenter studies, increase robustness, and enable integration across studies.

Similar content being viewed by others

experiments with lead

Issues and recommendations from the OHBM COBIDAS MEEG committee for reproducible EEG and MEG research

experiments with lead

Reporting checklists in neuroimaging: promoting transparency, replicability, and reproducibility

experiments with lead

Variability in the analysis of a single neuroimaging dataset by many teams

Avoid common mistakes on your manuscript.

Introduction

The remarkable progress of experimental human neuroscience in recent decades, fueled by the development of technologies to survey the brain non-invasively, has been partly overshadowed by the many examples of replication failures (e.g., Hirschhorn & Schonberg, 2024 ). Replication failures may stem from a number of factors (Open Science Collaboration, 2015 ), for instance, low standards of power calculations (Button et al., 2013 ; Ioannidis, 2005 ), the use of questionable statistical methods (e.g., Cumming, 2014 ; Wicherts et al., 2016 ), publication biases in favor of positive findings (Fanelli, 2012 ), or scarce description of the methods (Poldrack et al., 2008 ; Simmons et al., 2011 ).

The community has responded to those challenges by promoting better scientific practices that address those issues (Munafò et al., 2017 ). By now, determining sample sizes based on power analysis has become a common practice (e.g., Mumford & Nichols, 2008 ), journals also more routinely publish negative or null results (e.g., Baxter & Burwell, 2017 ), and preregistering the planned design and analyses is on the rise (e.g., AsPredicted, Open Science Framework). These procedures have had a large impact on the scientific community (Logg & Dorison, 2021 ; Protzko et al., 2023 ). One aspect that has received less attention in the empirical sciences, however, is the functioning of the experiment itself: does the experiment run as expected? While at first glance the reader may assume that experiments always run as planned and that if errors occur, they do not significantly impact the results, here we show a large variance among researchers when it comes to testing the experimental framework. Furthermore, we demonstrate that errors in the functioning of the experiment (e.g., the timing of the stimuli) can impact the results and their interpretation. Despite its importance, no standardized procedure for testing and reporting the quality of the experimental setups currently exists.

Standardized procedure s are becoming more relevant given the increased number of multi-lab studies (e.g., Frank et al., 2017 ; Melloni et al., 2021 ; Pavlov et al., 2021 ) and the increased availability of openly shared data. The large diversity of software and hardware poses a major challenge when integrating data across different laboratories, which often acquire data using different setup specifications. A similar issue arises when reusing openly shared data collected in various neuroscientific paradigms (Sejnowski et al., 2014 ). Without metadata describing the functioning of the experimental setup itself (variability in the presentation duration of the stimuli, reliability of the timestamps, etc.), integrating multiple datasets can pose problems, as it necessitates determining a priori how comparable the data are (Carp, 2012 ). For that, information about if and how the experiment was tested is important.

To demonstrate the need for a standardized testing framework, we first surveyed current practices in the field when it comes to testing and reporting the functioning of experimental setups in neuroscience. Testing practices varied among researchers, and many acknowledged discovering malfunctioning upon data collection. We then used simulations to demonstrate that even minor inaccuracies in hardware and software can alter results. Finally, we propose a standardized framework for testing and reporting the functioning of experimental setups for event-based designs. We provide an easy-to-use protocol, openly available in protocols.io .

Common testing practices: A survey

To investigate current practices of testing experimental setups in behavioral and neural science, 100 psychologists and neuroscientists reported on studies they had recently carried out. The majority of respondents studied human participants (94/100), collected neural data (67/100), and were early-career researchers (40/100 graduate students, 36/100 postdoctoral/senior researchers, 16/100 principal investigators).

Almost all respondents reported testing the experimental setup prior to data acquisition in a large majority of experiments (91/100), while a few (5/100) tested only some experiments or never tested the setup before data collection (4/100). The aspects of the experimental setup tested varied greatly among researchers (Fig. 1 ). Most tested the overall duration of the experiment (84/96), while a smaller proportion tested the accuracy of the event timings (60/96; see Box 1). There was also considerable diversity in the methods used to test the experimental setup, both between researchers (e.g., manual checks [48/96], scripted checks ([1/96], both types ([47/96]) and within the same lab between experiments (when asked whether the tests were based on a protocol, 53/96 responded that each experiment was tested differently; see Supplementary Material 1 for survey and the full set of results).

figure 1

The type and frequency of pre-experimental tests, declared by researchers to have been conducted prior to their last published experiment ( N  = 96). Y -axis: percentage of respondents. X -axis: aspect of the experimental environment tested (selected from a list of options; see Supplementary Material 1 ). The terms "event timing" and "event content" are defined in Box 1

Strikingly, a large proportion of researchers (64/100) reported noticing an issue after data collection that could have been avoided through prior testing. This reinforces the need for a streamlined procedure to benchmark the experimental setup (or experimental environment , as defined in Box 1), which could prevent the collection of unusable data and facilitate replication.

Box 1 Definitions of terms

an experimental design in which the participant is presented with specific stimuli (e.g., images, sounds) at prespecified times, to measure the reaction to those stimuli (e.g., behavior, neural activity, physiological response).

refers to the presentation of a certain stimulus at a particular time in an event-based design.

refers to the time during the experiment when an event of interest takes place in an event-based design.

refers to all aspects specifying an event, regardless of its timing during the experiment. For example, for a visual stimulus, its content comprises its identity (e.g., “a face”), location (e.g., “screen center”), and other features that differentiate it from the rest of the stimuli in the experiment and/or are relevant to the experimental conditions (e.g., orientation, color, luminance, size, presentation duration, belonging to a specific category of stimuli, task relevance, congruency with other events).

efers to the desired scheme that dictates how many stimuli shall be presented from each category, the expected order in which they shall be presented, and their duration and timing. It specifies both the timing of the expected sequence of events and the event content.

all hardware and software that is part of the experiment. This includes, but is not limited to, the software used to present stimuli and collect responses from the participant ( ), the computer on which the experimental software runs ( ), and any device that the experimental software and the computer communicate to during the experiment ( ). For example, peripherals could include the screen on which a visual stimulus is presented, cameras that record the participant, and devices measuring neural and/or other physiological activity.

the computer on which the runs to present the participants with the of the experiment.

the software (e.g., Psychtoolbox, PsychoPy, Presentation) executing the experimental program on the .

all the information written to the ’s disc through the during a single experimental run. This includes information about all events presented during the experiment ( and ) and all the measurements the recorded directly (e.g., mouse click, keyboard response) and indirectly (e.g., information reaching from ). When the peripheral runs on its own internal clock (see Fig. ), its output is recorded on a separate file. Thus, one experiment could have several output files.

all the hardware (and the software used to operate it) that are connected to the EC, or communicate with the ES in some way. We refer to two peripheral types: (1) peripheral devices with their own internal clock (e.g., neural imaging hardware and software), which communicate with the EC via triggers, and (2) devices that are run on the EC’s internal clock (e.g., response box, keyboard, computer mouse). All peripherals are part of the .

all output files produced by devices that are included in the . This includes all the output files of both the ES and peripherals.

messages sent to/from the EC from/to peripherals, used in order to synchronize the peripherals and ECs.

any experimental controlled by parameters predefined by the researcher, e.g., stimuli presented to the participant.

any experimental that depend on and are controlled by the participant (and not the researchers), e.g., participants’ motor responses.

the actual occurrence of an event within the (as opposed to the planning or logging of that event). The exact timing of the physical realization of an event can be determined by measuring the changes in the physical properties of the experiment setup (e.g., changes in luminance, or changes in decibels).

refers to constant temporal shift between the physical realization of an event and its recorded timestamp on an experimental device clock. For example, the timestamping of stimulus onsets recorded by EEG triggers is systematically delayed by 32 ms relative to the actual stimulus onsets (which can be inferred from the photodiode signal). Such delays have been discussed extensively in the M/EEG literature (Farzan et al., ; Pernet et al., ). By measuring these constant delays, they can easily be compensated for by shifting events' timestamps by the measured delay before data analysis (dedicated functions have been developed to do so, mne: epochs.shift_time, fieldtrip).

: refers to the varying temporal shift between the physical realization of an event and its recorded timestamp on an experimental device clock. Unlike the delay, jitters are not constant across trials and therefore cannot be easily compensated for.

figure 2

The image illustrates the connections between the various components and devices used in an example experiment. Top left : A participant sits in an experimental environment (see Box 1). They face a computer screen displaying a green star. Underneath is an eye-tracking device (black) and a response box. An EEG cap and amplifier are displayed as an example of neural measurement. Top right : The screen connects to the experimental computer which runs the experimental software (see Box 1). Bottom right : An example peripheral device recording neural data, connected to both the experimental computer (top right) and the EEG amplifier (top left). Bottom left : An example peripheral device recording eye-tracking data connected to both the eye-tracking camera (top left) and the experimental computer (top right). Clocks in each box indicate the device’s internal clock. Dashed arrows depict connections between components of independent devices. Solid black arrows represent connections sending triggers from the experimental computer to peripherals used for synchronization between recording devices.

Despite performing tests, results in publications (80/96) either because they considered reporting the results irrelevant (43/96) or because they did not know where to present them (38/96), or both (15/96; Figs. 3 ). Reporting practices, from reporting in the methods section, in the preregistration, in their lab book, or supplementary materials . A small proportion of researchers who never reported results (10/80) assumed that all published work had been thoroughly checked. However, as we have seen, the assumption that experimental tests and methods are consistent and error-free is not warranted, given the variations in testing procedures and the prevalence of errors identified retrospectively, during or after data collection. Thus, reporting of test results is crucial. If widely adopted, this practice would encourage more thorough testing, reduce errors, and enhance data accuracy across experiments and datasets.

figure 3

Reporting practices of researchers who declared testing their last published study ( N  = 96, since four respondents declared not testing their experiments at all). Outer circle: responses to “Did you report about the checks you performed and their results?” Red: respondents declaring not reporting the results of the tests ( N  = 80). Light blue: respondents declaring reporting the results of some tests ( N  = 15). Blue: one respondent who reported the results of all the performed tests. Inner circle: responses to “If you did not report the checks, why not?” Orange: because it was irrelevant ( N  = 3 out of the “some” category, N  = 40 out of the “no” category). Grayish-blue: because it wasn’t known where to report the results ( N  = 11 [“some” category], N  = 26 [“no” category], and 1 reporting the test). Tan: because it was irrelevant and not known where to report the results ( N  = 1 [“some” category], N  = 14 [“no” category])

Simulations of experimental environment malfunction

The survey demonstrated that research practices for testing the experimental setup vary widely, and that if tests are conducted, they are often not reported. Yet, researchers acknowledged discovering inaccuracies after data acquisition which could have been prevented. How severe are those inaccuracies to warrant extra testing and reporting? We simulated inaccuracies in event content and event timing to demonstrate whether and how they can affect experimental results (Supplementary Material 2 for full simulation). For this demonstration, we focused on reaction time data and P1 event-related potential (ERP), as both are widely used measurements and highly susceptible to timing inaccuracies. Both behavioral and neural simulation results are generalizable to other ERP components and uncontrolled events (events evoked by participants’ responses; see Box 1). We simulated two experimental conditions, representing two stimulus groups (see Supplementary Material 2 for the complete procedure). The simulation followed a common experimental hypothesis of a difference between two conditions, in both the P1 average amplitude and mean reaction times. The difference between the two conditions was referred to as \(\theta\) . To simulate inaccuracies in event contents, we shuffled stimulus labels on some trials (between 2% and 40%); for inaccuracies in event timings, we introduced a jitter of a predefined duration (from 2 to 40 ms) on some trials (between 2% and 40%). Simulations show that content and timing inaccuracies considerably diminish and sometimes even obliterate statistical differences between experimental conditions.

Figure 4 A shows that at small effect sizes ( \(\theta \hspace{0.17em}\) = 0.2), a small proportion of shuffled-label trials (20%) was enough to abolish a statistical effect in P1 amplitude. For the reaction time, the same was observed when 5% or more labels were shuffled across trials (Fig. 4 B). Jitter in stimulus timing also significantly affected the measured statistic: Figure 5 B shows that a jitter of 16 ms (a frame on a 60 Hz monitor) on 15% of the trials was enough to render P1 effects insignificant. For a 32 ms jitter, the same outcome was observed when 5% of the trials were affected. The effect of jitter was comparable for the reaction time: at \(\theta\) = 0.2, a 16 ms jitter affecting 15% of trials was sufficient to abolish the effect, and the same was observed at larger effect sizes ( \(\theta\) = 0.5) with a 32 ms jitter (two frames) in 5% of trials (Fig. 5 D).

figure 4

Effect of label shuffle on the P1 (A) and reaction time (B) t -statistic. The heatmap represents the observed t -statistic as a function of the simulated effect size ( x -axis) and proportion of trials for which the labels were shuffled ( y -axis). The color bar is centered on 1.96. Values below significance are colored in shades of blue, while values above significance are colored in shades of orange.

figure 5

Effects of timing inaccuracies on t -statistic for ERPs (A, B) and reaction time (C, D). A, C 3D plot relating the proportion of jittered trials ( x -axis), jitter duration ( y -axis), and t -statistic ( z -axis) as a function of effect size ( \(\uptheta\) , color bar) for P1 amplitude and reaction times, respectively. Gray hyperplane depicts significance threshold of t  = 1.96. An example at \(\uptheta\)  = 0.2 for P1 amplitude ( B ) and reaction time ( D ). The color bar centered on 1.96, values below significance are colored in shades of blue, while values above significance are colored in shades of orange

Taken together, the simulation results reinforce the importance of testing the recorded stimulus contents, showcasing the effect that timing inaccuracies and imprecision of the hardware (experimental computer [ EC ]; the computer on which the experiment runs; Box 1) can have. Recent studies suggest that inaccuracies in modern experimental software ( ES ; e.g., PsychoPy: Brainard, 1997 ; Psychtoolbox: Peirce, 2007 ) are minimal (on the sub-millisecond level, Bridges et al., 2020 ). This is only the case when the experiment is run in an ideal experimental environment. Accordingly, proper testing is required to ensure that this is indeed the case. This is even more important given the variations in the interaction between the EC and the ES (for example, in Psychtoolbox, a screen flip can be missed if too many textures are open; see definition in Box 1). Thus, our simulation results highlight the need for standardized testing of the experimental environment. Here, we argue that performing a few basic tests can increase event-based experiment reproducibility, improve data integration across datasets, and ultimately minimize errors, increasing efficiency. Next, we describe a standardized framework of tests.

We describe a standardized framework to benchmark the experimental environment in event-based designs including a standardized reporting protocol (protocols.io). The framework is aimed at helping researchers without imposing additional burdens on their standard experimental procedures. As such, it strikes a balance between exhaustiveness and ease of use. As a starting point, the framework is best suited for studies involving visual stimuli while collecting responses from participants (neural responses as well as behavior and eye-tracking data). Extension to other modalities (e.g., auditory, tactile) and response devices (e.g., microphone) will be needed. To illustrate the framework, we programmed a simple experiment in which we conducted all the tests, described step by step in a Jupyter notebook (see Supplemental Material 3 ). The notebook aims to help researchers in understanding the implementation of the framework, serving as an accessible resource that can be adapted for testing future experiments.

Each section starts with the motivation for testing a given aspect of the experimental environment, followed by the testing guidelines and the standardized reporting protocol (see protocols.io ). A successful visual event-based experiment necessitates thorough testing and validation of four key aspects: (1) the completeness and accuracy of the log file regarding event content , (2) the same for event timing , (3) the alignment between actual events and the planned experimental design, and (4) the reliability of peripheral triggers. This ensures experimental integrity and comparability across different studies and laboratories. Testing typically involves running the entire experiment at least once in the final experimental setup, as the hardware significantly influences the precision and accuracy of the experiment. While these aspects are discussed separately, they can generally be tested in a single experiment run, unless specified otherwise. We provide a visual representation of the implementation of all steps of the framework (see Figs.  6 ).

figure 6

Flowchart of the implementation of the testing framework. (1) First, the experiment should be adjusted to present a photodiode square and record the sounds from a contact microphone measuring the sound produced by keyboard presses. In addition, a response sequence should be planned. (2) Then, a set of pre-run tests should be performed. The first consists of measuring the size of the stimuli in centimeters to compute the size of the observed stimuli in degrees of visual angles. In addition, several trials of the experiment should be run, manually annotating what was presented on the screen in each trial. These manual annotations should then be compared to the log file entries to ensure that the log file accurately records what was actually presented. (3) The experiment should then be run in full while recording the photodiode and microphone signals as well as the log file for offline analysis. (4) Then, the recorded photodiode signal should be compared to the log file to estimate event timing inaccuracies. The microphone signal should be compared to the log file response timestamping to assess the responses’ timestamping inaccuracies. In addition, the logged responses should be compared to the planned response sequence defined in step (1) to ensure that the log file accurately records the pressed buttons. Finally, the experimental design can be tested for correctness based on the log file and photodiode timing information

Validating the features of the controlled events

Why test it.

In most visual experiments, standardizing the visual angle and eccentricity across participants is crucial. The initial step involves correctly setting up the visual angle and eccentricity in the experimental environment. This standardization is foundational for all subsequent tests and ensures that experimental events are presented under consistent conditions. This step is critical for both multi-lab studies and single-lab experiments, facilitating accurate replication by ensuring visual equivalence between original and replication setups.

How to test it?

First, three measurements should be obtained: the screen’s height and width in pixels (i.e., the screen resolution), the screen’s height and width in metric units, and the distance between the eye and the screen (in metric units). From the screen dimensions in centimeters and pixels, a conversion factor between pixels and centimeters should be computed as follows:

Using either the right or the left equation (i.e., the height or the width) should yield the same results, except for displays in which the pixel aspect ratio is not 1:1. In all other cases, inequality indicates measurement issues. After obtaining the conversion factor c , event sizes and offsets can be calculated in degrees of visual angles from screen pixels and size in metric units.

The next step is to measure the size and eccentricity of a specific experimental event. Once presented, the size and eccentricity of the visual event can be measured in metric units. Then, the inverse tangent can be used to calculate the vertical and horizontal visual angle of a stimulus using right-angle trigonometry:

where both size and distance are in metric units.

Similarly, the eccentricity should be calculated in visual angles by measuring the distance between the center of the screen and the center of the stimulus of interest (assuming fixation is in the middle of the screen) and applying the same formula ( 2 ).

Note that in the case that stimuli are off-centered, the measurement of the stimulus size in visual angles needs to be adjusted to account for the tilt between the screen and the eye. SR Research offers a free tool for visual angle calculation, for all types of stimuli (centered, one-sided, off-center; see www.sr-research.com/visual-angle-calculator/ ). When experimental events differ in size and eccentricity (e.g., stimuli displayed in different locations or differing in size), each such event should be measured at least once (per location, per size).

How to report it?

To report the relevant visual features in degrees of visual angle, the measured distance of the screen should be reported in centimeters (to describe the conditions under which the size of the stimuli was tested). For the stimuli, the measured height and width should be reported in degrees of visual angle. If different stimuli sizes are relevant to the experimental design, those should be measured and reported too. Similarly, if the stimuli are presented at a given eccentricity, both the expected and measured horizontal and vertical distance from the center of the screen should be reported (also in visual angles). Reporting both the horizontal and vertical visual angles from the expected center of the participant’s gaze provides the unique position of the stimuli. Thus, it is preferable to report the distance between the center of the stimulus and where participants are supposed to fixate (usually, the center of the screen).

Testing the reliability of the log file event content

Controlled events.

For accurate analysis, researchers must ensure that the content of events presented to participants is correctly recorded in the log files (see Box 1). This involves verifying that logged events match the actual events presented. Errors in the experimental software (ES) or hardware (EC) can lead to incorrect logging, such as mislabeling stimulus categories or identities, which can significantly alter experimental results. Any discrepancies require rectifying and retesting until the log file accurately reflects the presented content. While systematically checking log files, especially for experiments with complex stimuli like videos, can be challenging and time-consuming, it is crucial for ensuring the validity, interpretability, and reproducibility of the results.

Comparing the on-screen content with that of the log file where the content is documented requires running the experiment, ideally, from start to end (without participants), noting the event content presented on the screen (e.g., the stimulus identifier and relevant features such as orientation, location, color). Manually noting the content of each stimulus throughout the entire experiment might not be feasible, especially for experiments containing hundreds of trials. The compromise recommended here is to minimally check the content of each event condition at least once (though exhaustive testing is, of course, preferred). For example, suppose an experiment presents two stimulus groups (e.g., faces and objects) at four possible locations. In that case, the manual recording of event content during the experiment should at least cover a stimulus from each group appearing at each location once. By event condition, we refer to any feature relevant to the experimental design (e.g., category, location, task relevance, color, congruence). For designs with nested conditions (e.g., stimuli of different groups presented in different task-relevance conditions), a condition is understood as a combination of conditions (a task-relevant face constitutes a condition, and task-irrelevant faces another).

As events can be fast-paced, it might be impossible to mark them manually in real time. Therefore, we recommend following one of two options: One possibility is using external recording devices and software (cameras, microphones, or other recording software) to record the presented events, tagging them based on the recording’s playback. Importantly, the recording device needs to be external to the experimental environment (e.g., not a recording software running on the EC), as otherwise, it might interfere with the functioning of the experimental environment (as a screen-recording software is not expected to run in the data collection phase). Alternatively, the pace of the experiment could be slowed down for testing purposes in the ES such that the event content can be noted. The downside is that then two separate tests are required: one for testing the event content and a separate one for testing the reliability of the event timings (see next section).

The logged on-screen contents are then compared to the saved log files of the test run, expecting complete consistency between the two. Discrepancies point to malfunction of the ES, requiring correction prior to data collection.

The report on logging content inaccuracies should briefly describe the test method, detailing (i) the number of different conditions tested (considering unique combinations of nested conditions), (ii) the number of individual events tested within each condition, and (iii) the count of events that were incorrectly logged out of the total presented. Ideally, in a fully functioning experiment during final data collection, this count of inaccurately logged events should be zero.

Uncontrolled events

Uncontrolled events are those that depend on the participants’ behavior without any control from the experimenter (see Box 1), which are made on devices recorded by the EC (and not on peripherals). This test validates the assumption that the EC correctly registers EC responses into the log files. This is done by comparing the actual responses (made by the experimenter during the test) with the logged ones (recorded in the log files).

To validate the fidelity of the logged responses, the experiment should be run from start to end, recording the actual and logged responses made under a systematic, preplanned response sequence. This response plan provides a “recipe” to evaluate whether responses are properly logged and correspond to the executed responses. The purpose of the plan is to test the correct assignment of response buttons, counterbalancing, handling multiple or erroneous responses per single event, and logging responses that occur at unexpected moments during the experiment. To test the correct assignment of response buttons, the response plan should include at least one response of each type the participant is expected to make. When response mapping is counterbalanced within an experiment (e.g., a key is mapped to “Yes” in one block and “No” in another), the response plan should also include responses of the same buttons before and after such changes to determine whether the mappings are reflected in the log files. Handling multiple or erroneous responses is crucial, as participants’ behavior might deviate from that expected by the researchers. Therefore, the response plan should include responses using unexpected keys, cases where a key is pressed more or fewer times than expected (e.g., multiple presses when a single press is expected), or when more than one button is pressed simultaneously.

The final step is to compare the response plan with those recorded on the log file. If executed responses followed the response plan, any incompatibility found points to errors in the ES which require correction prior to data collection.

The count of uncontrolled events' content inaccuracies should include a description of the test procedure and response plan, along with three key metrics: (i) the number of different response types (various buttons pressed), (ii) the number of responses for each type, and (iii) the count of responses inaccurately logged out of the total responses. Ideally, in the final data collection phase, the number of inaccurately logged responses should be zero.

Testing the reliability of the log file event timing

Malfunctions in hardware or software can lead to inaccuracies or a lack of synchrony among three crucial timestamps: the time when a request to display an event is made, the actual occurrence of the event in the experimental hardware, and the time of the event as recorded in the log file. Discrepancies between these timings often reflect EC and ES limitations, rather than human error. Once a request to present an event occurs, the EC processes it along with other requests received at a given time leading to potential delays in the execution. In addition, the computer presents stimuli at a given refresh rate, limiting the display update to a certain number of times per second. As such, the stimulus presentation request has a narrow window to be processed, and when missed, the presentation only occurs in the next frame, unintentionally prolonging the previous event. Various factors (e.g., the EC’s graphics processing unit and CPU, parallel programs running in the background other than the ES) affect processing times and latencies caused by them. Furthermore, logging the timing of an event in the log files is inferred rather than logged in real time, as modern computers do not operate in real time. Thus, discrepancies can arise between the actual event timing and the inferred time recorded in the log file. As such, jitters are to be expected. Yet, large timing deviations require intervention in either the EC or ES. For example, the EC might have insufficient system resources available to run the experiment (solution: free up EC memory, stop unnecessary programs running in the background, e.g., antivirus), or the ES code might be written inefficiently (solution: improve the ES based on the specifics of the software being used). Our simulations showed that inaccuracies in event timing can affect results, which can be detrimental in studies requiring high timing precision (e.g., visual masking paradigms). The significance of precise timing in experiments has been acknowledged before (Plant, 2016 ), and various solutions to reduce inaccuracies have been proposed (Calcagnotto et al., 2021 ; Kothe et al., 2024 ). Experimental environments vary greatly, and so do the patterns of temporal discrepancies across setups (Bridges et al., 2020 ). Thus, characterizing and reporting the differences between environments is critical for comparing results across studies.

To evaluate event timing, a "ground truth" measurement, representing the actual physical timing of events, is essential. This requires an external device like a photodiode, which detects light changes, to record visual events on the screen. By attaching a photodiode to the screen and displaying luminance changes (like black versus white) in sync with the stimulus presentation, one can measure the start and end of each event and calculate its duration. Therefore, to accurately track event timings in an experiment, the experimental software (ES) should be modified to include extreme luminance changes alongside experimental events. The testing process involves the following steps:

Modify the ES to allow photodiode-based testing

While a photodiode sensor can be placed in any location where events appear on the screen, we advocate for a systematic approach. In this step, the researchers integrate into the ES the simultaneous presentations of the evaluated event and a square at one screen corner (or a location not overlapping with the stimulus, where the photodiode is easily attached). The square should “turn on” (e.g., white; RGB 255, 255, 255) at each event onset and offset, and should be “turned off” (e.g., black; RGB 0, 0, 0) otherwise (or vice versa; Fig. 7 ). This can be achieved by drawing both the test square and the visual event to a back buffer before querying the EC to display a new frame such that both stimulus and test square overlap in time.

figure 7

Depiction of visual presentation adjustment to enable testing of timing with a photodiode. A square at the corner of the screen should be flashed to white simultaneously with the onset of each event of interest and return to black thereafter. In this example, the square flashes to white at the onset of each visual stimulus (a colored star) as well as at the offset of each visual stimulus. Critically, the square should not remain white for the entire duration of the stimulus but only for a brief duration at the onset to enable the detection of transitions between each event of interest

Attach the photodiode and run the experiment

Once the ES displays the test square, place the photodiode in the location displaying the square, and run the experiment in its entirety. This step will create two files to be compared in the following steps: the log file and the photodiode output file, where the luminance level was recorded.

Extract event onsets and offsets from the recorded signal

The next step is to parse the recorded photodiode signal, which is done by setting a threshold discriminating between the two photodiode states, i.e., “on” and “off” (see Fig. 8 .1.). The threshold binarizes the signal such that values of “1” indicate samples above threshold (“on”), and “0” sample below threshold (“off”). Then, the onset of each event can be retrieved by finding the transition from off to on samples. This is achieved by computing the discrete difference (i.e., the difference between sample n +1 and sample n in the signal, see Fig. 8 .2 and 7.3) and locating the time points where this difference is equal to 1 (see Fig. 8 .4). Importantly, this step should yield the timestamp of the event in temporal units (seconds or milliseconds) by indexing the continuous time vector of the recording. Alternatively, sample units can be converted to seconds by multiplying the sample by the inverse of the sampling frequency.

figure 8

Pipeline to compute the log file timestamping delays using the photodiode. The recorded signal must first be parsed (left panel) to extract the photodiode timestamps. Then, the extracted timestamps can be compared to the log file timestamping by investigating the difference between intervals of successive events as recorded by the photodiode and log file (right panel)

Compare photodiode event timings to log files

The initial test ensures that the count of events detected by the photodiode aligns with those logged in the file. If the log file content has already been confirmed, any mismatch in event numbers could indicate issues with the photodiode signal quality. A reliable measuring device (photodiode) is required in order to be considered “ground truth” and is a prerequisite for testing the experiment event timing.

Assuming a reliable photodiode recording, then the photodiode measurements are compared to the log file timings. Two values need to be computed: the discrete difference between successive events’ timestamps in the (1) photodiode ( \({\Delta }_{photo i}\) ), and those logged by the (2) EC ( \({\Delta }_{log i}\) ):

where \({t}_{photo i}\) and \({t}_{log i}\) are the timestamps for a given \(i\) event in the photodiode recording and the experimental output, respectively (see Fig. 8 .5.a and b). The logging timestamping inaccuracy is then computed as the difference between \({\Delta }_{photo i}\) and \({\Delta }_{log i}\) as

\({\Delta }_{i}\) constitutes the log file timestamping inaccuracy for every single event (see Fig. 8 .6), assuming a reliable photodiode.

For a well-calibrated setup, the expected difference between \({\Delta }_{photo i}\) and \({\Delta }_{log i}\) is on the order of milliseconds or lower (Bridges et al., 2020 ). The average in \({\Delta }_{i}\) should approximate zero, with a small standard deviation in the order of a few milliseconds. Large discrepancies indicate a problem, either with the timing at which events are displayed or at which they are logged. Inspecting the difference between both timestamp vectors might reveal the cause of those discrepancies (missing events, events that are systematically displaced in time, etc.).

The average and standard deviation values of \({\Delta }_{i}\) across events should be reported.

Uncontrolled event timing (e.g., reaction times) can also be affected by hardware and software limitations. Reaction time effects are often in the range of tens of milliseconds (Schlossmacher et al., 2020 ; van Gaal et al., 2010 ), making it necessary to determine the recording precision. Timestamping of uncontrolled events can show delays as well as jitters . It is therefore crucial to test both and report the results.

A method advocated by Psychtoolbox (and their KeyboardLatencyTest method) is to concurrently record the sound associated with the actual press of a button. This is done by placing a microphone close to the response device used in the experiment. This requires a modification of the ES to log the microphone-recorded sound into a file. Then, the button press onset can be extracted from the audio file and compared to the timestamps of responses recorded in the log file. The steps are as follows:

Modify the ES to record sound

This can be done by adding a statement at the beginning of the code to continuously record the sound throughout the experiment.

Attach the microphone and run the experiment

A contact microphone should be attached close to the keys being pressed on the response device. Sharply pressing the keys during the test run ensures easy processing in the next step. External recording devices (photodiode, microphone) are for testing purposes only and can be removed for data collection.

After obtaining the microphone recordings and log files, the analysis steps to extract and compare the triggers are the same as the procedure described in the sections “ Extract event onsets and offsets from the recorded signal ” and “ Compare photodiode event timings to log files ,” to compute the \({\Delta }_{r}\) (Eq. 4 , with r denoting responses), respectively.

Like the test for the precision of controlled event timing, the average and standard deviation of \({\Delta }_{r}\) should be documented. In an optimally calibrated experimental setting, the average \({\Delta }_{r}\) is expected to be near zero, with the standard deviation within the range of milliseconds.

Validating the experimental design parameters

After the content and timing of events has been validated, specific aspects of the experimental design can be evaluated (e.g., the duration and balancing of event groups, conditions, their sequential presentation). As there are countless choices of experimental designs, our aim is not to cover all potential designs, but instead to provide the community with a systematic method for testing and reporting experimental design aspects.

Below, we focus on two examples validating the experimental design with respect to content and timing.

Concerning adhering to the experimental design content, the aim is to confirm that those rules that researchers wish to enforce in the controlled events (e.g., presented stimuli) are indeed implemented (e.g., order of presentation, constraints on sequential trials). To test the content requirements, the following steps are proposed:

Know the experiment content requirements

Document as explicitly as possible what is expected to be enforced, e.g., number of stimulus repetitions, randomization scheme, event order, stimulus locations, and balancing of event groups. Every requirement pertaining to event content in the experimental design should be specified and checked.

Ensure that the relevant information is recorded in the log file

Ensure the information required to test the previous step is stored in the log file.

Prepare checks to make sure that the log file meets each requirement

Prepare, ideally, a programmatic script that reads the log file and ensures that its content adheres to each requirement listed in the first step following the completion of the test run. In cases where researchers have pre-made sequences including all the information about the flow of the events within an experiment, tests can be conducted in those pre-made files.

With respect to validating the timings of the experimental design, the goal is to assess how closely the actual durations of events align with their intended durations. Here, after comparing the photodiode output with the log file, researchers compare the actual duration of the observed events with their planned duration, as specified in the experimental design timing scheme.

The first step is to obtain the observed event durations from the photodiode recordings:

where OEO stands for the onset and offset of each event i. Then, one can compute how much these durations deviate from the plan:

where \(Planned duration\) is the planned duration of each event \(i\) according to the experimental design.

The report for experiments with varying experimental design content-based rules should include a comprehensive list of design choices. Evaluating how well the design meets these requirements involves counting the number of events, out of the total tested, that comply with each requirement. For nested designs, report the number of events for each combination. If relevant, include the count of events per condition in each block. Notably, there should be no deviations, as any would suggest noncompliance with the study's plan. If content errors are discovered, a reassessment of the experimental software (ES) is needed, followed by retesting after corrections are made and inaccuracies are corrected.

For event timing, the precision of the planned timings should be reported. We advocate reporting the mean and standard deviation \({\Delta }_{i}\) . The mean is expected to approach zero, and the standard deviation to be within the few milliseconds range. Larger values are suggestive of errors or hardware issues that might require attention.

Testing peripheral triggers

So far, we have described tests to benchmark the EC and ES. Yet, when the experimental environment contains peripherals (a typical case for neuroscience experiments), tests to assess the interaction of the peripherals with the EC and ES (i.e., triggers) are also necessary. Triggers serve a dual purpose here: (1) they provide temporal markers for events of interest on which to focus offline data analysis, and (2) they play a pivotal role in maintaining synchrony between the EC and the recording system (e.g., electroencephalography [EEG], eye-tracking device). This is useful for addressing issues related to clock drift, which can be detected and corrected when sending triggers marking events of interest to multiple devices (Niso et al., 2022 ).

The interpretation of the signal recorded by these devices depends on the integrity of the trigger transmissions (Boudewyn et al., 2023 ; Luck, 2014 ), which is the focus of the current test. Akin to previous tests, both the content and the timing of the triggers representing the controlled and uncontrolled events are evaluated. These tests should be performed for each peripheral device used in the experiment.

Peripheral trigger content

Assuming the log file event records are accurate, the congruence between each logged event and its corresponding trigger content is assessed. Any deviations point to problems with trigger logging or the peripheral device, necessitating review and correction.

Peripheral trigger timing

Compute the discrete difference between successive peripheral trigger timestamps, and compare it against the discrete differences between consecutive observed events, as recorded by the photodiode. The difference between these two arrays provides an estimate of the peripheral trigger temporal jitter ( \({\Delta }_{i}\) ):

Note that this method evaluates temporal jitter but is not suited to evaluate delays between events and peripheral triggers. As described by Farzan et al. ( 2017 ), delays can be measured by recording the photodiode signal on the same computer clock as the peripheral of interest. The delay of the trigger onset (marked below as \({\Delta }_{i}\) too) can then be directly compared to the detected photodiode onset as:

The count of events where the trigger content does not align with the log file content should be documented, and reported as the count of mismatched events over the total count of events.

The report should contain the average and standard deviation of trigger timings. For systematic delay tests, both the mean and standard deviation should be reported.

Standardized report

The framework is summarized in a checklist available on protocols.io, alongside a standardized format for reporting these results. Both the checklist and the report are in the protocols.io platform, under this link . Testing the experimental environment is crucial, and so is the accompanying detailed report of the test results to altogether enhance transparency and foster reproducibility and replicability.

We present a framework to systematically test and report the performance of experimental environments. We aimed to minimize financial burden by relying on hardware that either most labs already have, or is inexpensive to acquire or build (e.g., photodiode and microphone). Our protocol enables all tests to be conducted in a single test run, making the framework more efficient for researchers and preventing additional work due to errors or limitations discovered during the data collection stage. Furthermore, the framework also includes guidelines for detailed reporting of the parameters and results of each test, as means to increase scientific transparency. Such transparency allows the scientific community to better evaluate the conclusions of the study, as they rely heavily on the proper functioning of the experiment. As indicated by the survey results, more than half of researchers did not share their test results because they did not know where or how to write them; the current framework would hopefully help overcome this hurdle.

The framework presented herein is designed to be applicable to most recording modalities used in human neurosciences. To test the content and timing of the experimental events, the only requirement is that the peripherals are capable of receiving triggers (which is a standard feature in experimental environments containing peripherals in event-related designs). To assess event timing accuracy, jitters are estimated by comparing the intervals between events recorded by the system with those obtained from physical measurements: jitter-free peripherals should show identical intervals. Although the clocks of different devices may drift apart over long periods, computing the intervals between events occurring in relatively short succession overcomes this problem. Notably, this method does not account for systematic delays between two systems (i.e., if a peripheral device receives the triggers with a systematic delay of 30 ms with respect to the physical event). To address this, systematic delays should be quantified by recording the physical signals on the same computer as the peripheral being used, as described in section " Testing peripheral triggers " (Eq. 8 ). As such, our framework offers a comprehensive set of tests for timing issues applicable to a wide range of technologies used in cognitive neuroscience.

Investing extra time and resources in testing and reporting the experimental environment is worthwhile, as simulations show that malfunctions in recording event timing and content can significantly impact results. Researchers should aim to reduce controllable errors and characterize noise in their setups to increase the likelihood of detecting real effects and reduce false negatives. The benefits of this framework extend beyond individual experiments, enhancing replication efforts and scientific reliability. Recent replication challenges in neuroscience and psychology (e.g., Kristal et al., 2020 ) highlight the need for quality assurance in experimental environments, especially when multiple labs collaborate (e.g., COGITATE: COGITATE Consortium et al., 2023 ; eegManyLabs: Pavlov et al., 2021 ; The International Brain Laboratory et al., 2021 ). Without strict quality controls, the potential benefits of multi-site data collection risk being overshadowed by inter-site variability, masking real effects observable in single-site datasets (e.g., de Vries et al., 2022 ; Farzan et al., 2017 ). The step-by-step process of conducting the framework on protocols.io enables researchers to thoroughly test their experiments in an effective and relatively non-time-consuming manner. A detailed demonstration of the application of our framework to an experiment is provided as a Jupyter notebook (see Supplemental Material 3 ).

We believe the short time invested is outweighed by the benefits in the long run. We acknowledge that for some, the proposed framework and reporting approach might seem excessive and may also be met with skepticism, as it may increase the burden on the researchers. Yet, as our survey shows, most researchers do encounter issues only after data collection begins. Addressing errors retrospectively is time-consuming, and in extreme cases, undetected issues can lead to retractions (e.g., Grave et al., 2021 ) or flawed results. Standardized testing and reporting can identify problems early, aiding replication and consistency across studies, including multi-lab projects. We hope this practice will gradually become integral to good scientific conduct, similar to the adoption of preregistration, which was initially met with skepticism (Paret et al., 2022 ), as it required more effort and resources—but over time proved to be highly beneficial (Gentili et al., 2021 ; Protzko et al., 2023 ).

Finally, we argue that our proposed framework may not be costlier than current testing practices in the field. Our survey indicates that researchers already invest time in testing their experimental setups and also recognize the need for testing before data collection. Yet, these efforts often go unreported. Without a standardized test protocol, each researcher and lab must devise their own methods, with many creating unique tests for each experiment. Our framework outlines four key aspects to test in event-based experiments: reliability of log file (1) event content and (2) event timing, (3) fulfillment of the experimental design, and (4) reliability of the peripheral device. These tests are broad enough to cover most visual presentation designs, needing only minor adjustments for specific cases. With careful planning, a single full experimental run, including all peripherals, can suffice for comprehensive testing. Most steps, except for logging event content, can be automated with minimal execution time. While script development might initially take time, these scripts are generally reusable across studies, offering long-term efficiency.

Thus, we believe that the research community can benefit from these resources. This framework can enhance the credibility of research findings, improve research efficiency and cost-effectiveness, and, by reporting test results, increase the transparency and reproducibility of research methods.

Code and data availability

All code and data used and generated for this paper are openly available at https://github.com/Cogitate-consortium/ExperimentTestingFramework

Baxter, M. G., & Burwell, R. D. (2017). Promoting transparency and reproducibility in Behavioral Neuroscience: Publishing replications, registered reports, and null results. Behavioral Neuroscience, 131 (4), 275–276. https://doi.org/10.1037/bne0000207

Article   PubMed   PubMed Central   Google Scholar  

Boudewyn, M. A., Erickson, M. A., Winsler, K., Ragland, J. D., Yonelinas, A., Frank, M., Silverstein, S. M., Gold, J., MacDonald III, A. W., Carter, C. S., Barch, D. M., & Luck, S. J. (2023). Managing EEG studies: How to prepare and what to do once data collection has begun. Psychophysiology , n/a (n/a), e14365. https://doi.org/10.1111/psyp.14365

Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10 (4), 433–436.

Article   PubMed   Google Scholar  

Bridges, D., Pitiot, A., MacAskill, M. R., & Peirce, J. W. (2020). The timing mega-study: Comparing a range of experiment generators, both lab-based and online. PeerJ, 8 , e9414. https://doi.org/10.7717/peerj.9414

Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14 (5), 365–376.

Calcagnotto, L., Huskey, R., & Kosicki, G. M. (2021). The accuracy and precision of measurement: Tools for validating reaction time stimuli. Computational Communication Research, 3 (2), 1–20.

Article   Google Scholar  

Carp, J. (2012). On the Plurality of (Methodological) Worlds: Estimating the Analytic Flexibility of fMRI Experiments. Frontiers in Neuroscience, 6 , 149. https://doi.org/10.3389/fnins.2012.00149

Consortium, C., Ferrante, O., Gorska-Klimowska, U., Henin, S., Hirschhorn, R., Khalaf, A., Lepauvre, A., Liu, L., Richter, D., Vidal, Y., Bonacchi, N., Brown, T., Sripad, P., Armendariz, M., Bendtz, K., Ghafari, T., Hetenyi, D., Jeschke, J., Kozma, C., …, & Melloni, L. (2023). An adversarial collaboration to critically evaluate theories of consciousness (p. 2023.06.23.546249). bioRxiv. https://doi.org/10.1101/2023.06.23.546249

Cumming, G. (2014). The new statistics: why and how. Psychological Science, 25 (1), 7–29.

de Vries, S. E. J., Siegle, J. H., & Koch, C. (2022). Sharing Neurophysiology Data from the Allen Brain Observatory: Lessons Learned (arXiv:2212.08638). arXiv. https://doi.org/10.48550/arXiv.2212.08638

Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90 (3), 891–904.

Farzan, F., Atluri, S., Frehlich, M., Dhami, P., Kleffner, K., Price, R., Lam, R. W., Frey, B. N., Milev, R., Ravindran, A., McAndrews, M. P., Wong, W., Blumberger, D., Daskalakis, Z. J., Vila-Rodriguez, F., Alonso, E., Brenner, C. A., Liotti, M., Dharsee, M., & Kennedy, S. H. (2017). Standardization of electroencephalography for multi-site, multi-platform and multi-investigator studies: Insights from the Canadian biomarker integration network in depression. Scientific Reports, 7 (1), 1. https://doi.org/10.1038/s41598-017-07613-x

Frank, M. C., Bergelson, E., Bergmann, C., Cristia, A., Floccia, C., Gervain, J., Hamlin, J. K., Hannon, E. E., Kline, M., Levelt, C., Lew-Williams, C., Nazzi, T., Panneton, R., Rabagliati, H., Soderstrom, M., Sullivan, J., Waxman, S., & Yurovsky, D. (2017). A Collaborative approach to infant research: Promoting reproducibility, best practices, and theory-building. Infancy, 22 (4), 421–435. https://doi.org/10.1111/infa.12182

Gentili, C., Cecchetti, L., Handjaras, G., Lettieri, G., & Cristea, I. A. (2021). The case for preregistering all region of interest (ROI) analyses in neuroimaging research. European Journal of Neuroscience, 53 (2), 357–361. https://doi.org/10.1111/ejn.14954

Grave, J., Soares, S. C., Morais, S., Rodrigues, P., & Madeira, N. (2021). Retraction notice to “The effects of perceptual load in processing emotional facial expression in psychotic disorders” [Psychiatry Research Volume 250C April 2017, pages 121—128]. Psychiatry Research, 303 , 114077. https://doi.org/10.1016/j.psychres.2021.114077

Hirschhorn, R., & Schonberg, T. (2024). Replication. In  Encyclopedia of the Human Brain (2nd ed.). Elsevier.

Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2 (8), e124.

Kothe, C., Shirazi, S. Y., Stenner, T., Medine, D., Boulay, C., Crivich, M. I., ... & Makeig, S. (2024). The lab streaming layer for synchronized multimodal recording. bioRxiv , 2024-02. https://doi.org/10.1101/2024.02.13.580071

Kristal, A. S., Whillans, A. V., Bazerman, M. H., Gino, F., Shu, L. L., Mazar, N., & Ariely, D. (2020). Signing at the beginning versus at the end does not decrease dishonesty. Proceedings of the National Academy of Sciences, 117 (13), 7103–7107. https://doi.org/10.1073/pnas.1911695117

Logg, J. M., & Dorison, C. A. (2021). Pre-registration: Weighing costs and benefits for researchers. Organizational Behavior and Human Decision Processes, 167 , 18–27. https://doi.org/10.1016/j.obhdp.2021.05.006

Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.). MIT Press.

Google Scholar  

Melloni, L., Mudrik, L., Pitts, M., & Koch, C. (2021). Making the hard problem of consciousness easier. Science, 372 (6545), 911–912. https://doi.org/10.1126/science.abj3259

Mumford, J. A., & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. NeuroImage, 39 (1), 261–268. https://doi.org/10.1016/j.neuroimage.2007.07.061

Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Du Sert, N. P., & Ioannidis, J. P. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1 (1), 1–9. https://doi.org/10.1038/s41562-016-0021

Niso, G., Krol, L. R., Combrisson, E., Dubarry, A. S., Elliott, M. A., François, C., Héjja-Brichard, Y., Herbst, S. K., Jerbi, K., Kovic, V., Lehongre, K., Luck, S. J., Mercier, M., Mosher, J. C., Pavlov, Y. G., Puce, A., Schettino, A., Schön, D., Sinnott-Armstrong, W., …, Chaumon, M. (2022). Good scientific practice in EEG and MEG research: Progress and perspectives. NeuroImage , 257 , 119056. https://doi.org/10.1016/j.neuroimage.2022.119056

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716.

Paret, C., Unverhau, N., Feingold, F., Poldrack, R. A., Stirner, M., Schmahl, C., & Sicorello, M. (2022). Survey on open science practices in functional neuroimaging. NeuroImage, 257 , 119306. https://doi.org/10.1016/j.neuroimage.2022.119306

Pavlov, Y. G., Adamian, N., Appelhoff, S., Arvaneh, M., Benwell, C. S. Y., Beste, C., Bland, A. R., Bradford, D. E., Bublatzky, F., Busch, N. A., Clayson, P. E., Cruse, D., Czeszumski, A., Dreber, A., Dumas, G., Ehinger, B., Ganis, G., He, X., Hinojosa, J. A., …, & Mushtaq, F. (2021). #EEGManyLabs: Investigating the replicability of influential EEG experiments. Cortex , 144 , 213–229. https://doi.org/10.1016/j.cortex.2021.03.013

Peirce, J. W. (2007). PsychoPy—Psychophysics software in Python. Journal of Neuroscience Methods, 162 (1), 8–13. https://doi.org/10.1016/j.jneumeth.2006.11.017

Pernet, C., Garrido, M., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., Salmelin, R., Schoffelen, J. M., Valdes-Sosa, P. A., & Puce, A. (2018). Best practices in data analysis and sharing in neuroimaging using MEEG . https://doi.org/10.31219/osf.io/a8dhx

Plant, R. R. (2016). A reminder on millisecond timing accuracy and potential replication failure in computer-based psychology experiments: An open letter. Behavior Research Methods, 48 (1), 408–411.

Poldrack, R. A., Fletcher, P. C., Henson, R. N., Worsley, K. J., Brett, M., & Nichols, T. E. (2008). Guidelines for reporting an fMRI study. Neuroimage, 40 (2), 409–414.

Protzko, J., Krosnick, J., Nelson, L., Nosek, B. A., Axt, J., Berent, M., Buttrick, N., DeBell, M., Ebersole, C. R., Lundmark, S., MacInnis, B., O’Donnell, M., Perfecto, H., Pustejovsky, J. E., Roeder, S. S., Walleczek, J., & Schooler, J. W. (2023). High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour, 8 (2), 311–319. https://doi.org/10.1038/s41562-023-01749-9

Schlossmacher, I., Dellert, T., Pitts, M., Bruchmann, M., & Straube, T. (2020). Differential Effects of Awareness and Task Relevance on Early and Late ERPs in a No-Report Visual Oddball Paradigm. Journal of Neuroscience, 40 (14), 2906–2913. https://doi.org/10.1523/JNEUROSCI.2077-19.2020

Sejnowski, T. J., Churchland, P. S., & Movshon, J. A. (2014). Putting big data to good use in neuroscience. Nature Neuroscience, 17 (11), 11. https://doi.org/10.1038/nn.3839

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22 (11), 1359–1366.

The International Brain Laboratory, Aguillon-Rodriguez, V., Angelaki, D., Bayer, H., Bonacchi, N., Carandini, M., Cazettes, F., Chapuis, G., Churchland, A. K., Dan, Y., Dewitt, E., Faulkner, M., Forrest, H., Haetzel, L., Häusser, M., Hofer, S. B., Hu, F., Khanal, A., Krasniak, C., …, & Zador, A. M. (2021). Standardized and reproducible measurement of decision-making in mice. eLife , 10 , e63711. https://doi.org/10.7554/eLife.63711

van Gaal, S., Ridderinkhof, K. R., Scholte, H. S., & Lamme, V. A. F. (2010). Unconscious Activation of the Prefrontal No-Go Network. Journal of Neuroscience, 30 (11), 4143–4150. https://doi.org/10.1523/JNEUROSCI.2992-09.2010

Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E., Bakker, M., Van Aert, R., & Van Assen, M. A. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in psychology, 7 , 1832. https://doi.org/10.3389/fpsyg.2016.01832

Download references

Acknowledgments

We would like to thank the members of the COGITATE consortium for the inspiration for and support of this manuscript. Furthermore, we want to thank Sarah Brendecke and Felix Bernoulli from the graphics department of the Max Planck Institute for empirical aesthetics for their support in generating the graphics of this paper as well as Ryszard Auksztulewicz for his support with the ERP simulation. A special thanks to Tanya Brown for help with the formatting of the materials and to the members of the NCClab for their input on the testing framework.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Neural Circuits, Consciousness and Cognition Research Group, Max Planck Institute of Empirical Aesthetics, Frankfurt am Main, Germany

Alex Lepauvre & Lucia Melloni

Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, 6500 HB, the Netherlands

Alex Lepauvre

Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel

Rony Hirschhorn & Liad Mudrik

Boston Children’s Hospital, Harvard Medical School, Boston, USA

Katarina Bendtz

School of Psychological Sciences, Tel-Aviv University, Tel Aviv, Israel

Liad Mudrik

Department of Neurology, NYU Grossman School of Medicine, New York, USA

Lucia Melloni

Canadian Institute for Advanced Research (CIFAR), Brain, Mind, and Consciousness Program, Toronto, ON, Canada

Liad Mudrik & Lucia Melloni

You can also search for this author in PubMed   Google Scholar

Contributions

The authors confirm their contribution to the paper as follows: Conceptualization : A. Lepauvre, R. Hirschhorn, K. Bendtz, L. Mudrik, L. Melloni; Data curation : A. Lepauvre, R. Hirschhorn; Formal Analysis : A. Lepauvre, R. Hirschhorn; Funding acquisition : L. Melloni, L. Mudrik; Investigation : A. Lepauvre, R. Hirschhorn, K. Bendtz; Methodology : A. Lepauvre, R. Hirschhorn, K. Bendtz, L. Melloni; Project administration : A. Lepauvre; Resources : L. Melloni, L. Mudrik; Software : A. Lepauvre, R. Hirschhorn; Supervision : L. Melloni, L. Mudrik; Visualization : A. Lepauvre, R. Hirschhorn; Writing—original draft : A. Lepauvre, R. Hirschhorn, K. Bendtz; Writing—review & editing : A. Lepauvre, R. Hirschhorn, L. Melloni, L. Mudrik

Corresponding author

Correspondence to Alex Lepauvre .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1588 KB)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Lepauvre, A., Hirschhorn, R., Bendtz, K. et al. A standardized framework to test event-based experiments. Behav Res (2024). https://doi.org/10.3758/s13428-024-02508-y

Download citation

Accepted : 15 August 2024

Published : 16 September 2024

DOI : https://doi.org/10.3758/s13428-024-02508-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Replication
  • Experimental methods
  • Pre-acquisition tests
  • Find a journal
  • Publish with us
  • Track your research

September 17, 2024

Book Review: How One Weird Rodent Ecologist Tried to Change the Fate of Humanity

A biography of the scientist whose work led to fears of a ‘population bomb’

By Ben Goldfarb

Illustration of four mice surrounding a small house.

Frank Stockton

Dr. Calhoun's Mousery: The Strange Tale of a Celebrated Scientist, a Rodent Dystopia, and the Future of Humanity by Lee Alan Dugatkin University of Chicago Press, 2024 ($27.50)

In the 1960s and 1970s American society suffered a yearslong collective panic about the perceived threat of overpopulation. Biologist Paul Ehrlich appeared on The Tonight Show to tout The Population Bomb , his 1968 polemic about human numbers run amok. The 1973 film Soylent Green depicted a squalid hellscape in which surplus people would be processed into food. College students pledged to remain childless for the benefit of Earth.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing . By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

This anxiety originated, in part, in the laboratory of John Bumpass Calhoun, an enigmatic ecologist who spent decades documenting the adverse effects of overcrowding on rodents in elaborate experimental “cities.” Calhoun is largely obscure today, but few scientists in his time wielded more influence. He hobnobbed with science-fiction writer Arthur C. Clarke and was featured in books by naturalist E. O. Wilson and journalist Tom Wolfe—in the process spreading overpopulation angst far and wide. “The most profound impact of Calhoun’s studies lies far from academic halls and ivory towers,” writes Lee Alan Dugatkin in Dr. Calhoun’s Mousery , a new biography nearly as quirky as its subject. Calhoun’s work permanently “seeped into the public consciousness.”

Calhoun made for an unlikely prophet. A nature lover from Tennessee, he took a job in the 1940s leading a long-term study in Baltimore with the primary goal of controlling urban rats. Calhoun found that each city block was home to around 150 rats, a number he found low given the “abundant sources of food in open garbage cans.” Rat populations, he suspected, were “self-regulating”: when new rats tried to move in, residents kicked them out. But the unpredictability of Baltimore’s streets—­ where humans were constantly killing rats or messing with the traps—frustrated Calhoun’s analyses. To truly understand rat society, he decided, he needed to control their environment.

In the late 1950s the National Institute of Mental Health gave Calhoun the opportunity to manipulate rats in a remodeled Maryland barn. Calhoun, an endlessly inventive designer of experiments, built an enclosure outfitted with rat apartments and partitioned the pen into connected “neighborhoods,” creating a murid arcadia that he could observe at his leisure.

This utopia soon turned nightmarish. As the rats multiplied, they fed and gathered in ever greater densities, leading to a social breakdown that Calhoun called a “behavioral sink.” Packs of libidinous males relentlessly hounded females, who in turn ignored their offspring; in some neighborhoods, pup mortality hit 96 percent. The rats, Calhoun declared, suffered from “pathological togetherness” that could lead to collapse. In the years that followed, he shifted to mice, but his fundamental conclusions remained the same: rodents succumbed to chaos as their populations exploded.

Calhoun wasn’t shy about extrapolating to our own species’ fate. “Perhaps if population growth continues to grow unchecked in humans, we might one day see the human equivalent” of socially catatonic rodents, he told the Washington Daily News in one characteristic interview. His fears both channeled the zeitgeist and directed it.

Dugatkin—an evolutionary biologist, science historian and prolific author who sifted through thousands of pages at the Calhoun archive in Bethesda—is an admirably thorough researcher. But his granular chronology of Calhoun’s activities sometimes slides too deep into a recitation of media coverage, conference talks and intricate experiments. Amid this blizzard of minutiae, Mousery occasionally loses sight of a question that should be central to any biography: Why does Calhoun matter today? Dugatkin acknowledges that the “lasting impact of [Calhoun’s] work is nowhere near” that of pioneering behaviorists such as Ivan Pavlov. But he misses an opportunity to probe the social debates that his subject’s work catalyzed. Did Calhoun’s darker prognostications do harm? The population bomb, after all, failed to detonate.

Calhoun belonged to a generation of scientists who had no compunctions about straying from their disciplinary lane. He wrote poetry and sci-fi and consulted on humane prison design. Dugatkin captures the grand ambition of a man who gazed at rodents and saw the universe, even if the significance of his research is murky today. As Dugatkin notes, the disturbing dynamics that Calhoun produced in his micromanaged “universes” have never been observed in the wild. Calhoun didn’t describe the world; he created his own.

Cover of the book Dr. Calhoun's Mousery

Your browser is not supported

Sorry but it looks as if your browser is out of date. To get the best experience using our site we recommend that you upgrade or switch browsers.

Find a solution

  • Skip to main content
  • Skip to navigation

experiments with lead

  • Back to parent navigation item
  • Primary teacher
  • Secondary/FE teacher
  • Early career or student teacher
  • Higher education
  • Curriculum support
  • Literacy in science teaching
  • Periodic table
  • Interactive periodic table
  • Climate change and sustainability
  • Resources shop
  • Collections
  • Remote teaching support
  • Starters for ten
  • Screen experiments
  • Assessment for learning
  • Microscale chemistry
  • Faces of chemistry
  • Classic chemistry experiments
  • Nuffield practical collection
  • Anecdotes for chemistry teachers
  • On this day in chemistry
  • Global experiments
  • PhET interactive simulations
  • Chemistry vignettes
  • Context and problem based learning
  • Journal of the month
  • Chemistry and art
  • Art analysis
  • Pigments and colours
  • Ancient art: today's technology
  • Psychology and art theory
  • Art and archaeology
  • Artists as chemists
  • The physics of restoration and conservation
  • Ancient Egyptian art
  • Ancient Greek art
  • Ancient Roman art
  • Classic chemistry demonstrations
  • In search of solutions
  • In search of more solutions
  • Creative problem-solving in chemistry
  • Solar spark
  • Chemistry for non-specialists
  • Health and safety in higher education
  • Analytical chemistry introductions
  • Exhibition chemistry
  • Introductory maths for higher education
  • Commercial skills for chemists
  • Kitchen chemistry
  • Journals how to guides
  • Chemistry in health
  • Chemistry in sport
  • Chemistry in your cupboard
  • Chocolate chemistry
  • Adnoddau addysgu cemeg Cymraeg
  • The chemistry of fireworks
  • Festive chemistry
  • Education in Chemistry
  • Teach Chemistry
  • On-demand online
  • Live online
  • Selected PD articles
  • PD for primary teachers
  • PD for secondary teachers
  • What we offer
  • Chartered Science Teacher (CSciTeach)
  • Teacher mentoring
  • UK Chemistry Olympiad
  • Who can enter?
  • How does it work?
  • Resources and past papers
  • Top of the Bench
  • Schools' Analyst
  • Regional support
  • Education coordinators
  • RSC Yusuf Hamied Inspirational Science Programme
  • RSC Education News
  • Supporting teacher training
  • Interest groups

A primary school child raises their hand in a classroom

  • More navigation items

Rechargeable cells: the lead–acid accumulator

In association with Nuffield Foundation

  • No comments

Use this practical to demonstrate the chemistry behind rechargeable batteries, using a lead–acid accumulator cell

Some electrochemical cells are rechargeable – the electrode reactions are reversible and the process can be repeated many times. Such cells can be used to store electricity. The most common type of heavy duty rechargeable cell is the familiar lead-acid accumulator (‘car battery’) found in most combustion-engined vehicles.

This experiment can be used as a class practical or demonstration. Students learn how to construct a simple lead–acid cell consisting of strips of lead and an electrolyte of dilute sulfuric acid. The cell should then be charged for different lengths of time, before being discharged through a light bulb. Students measure the time the bulb remains lit, plotting a graph of this time against the charging time to show the relationship between the electrical energy put into the cell and the energy released.

Without going into the detail of the electrode reactions, this experiment can be used as a demonstration or class exercise to investigate a reversible electrochemical cell in the context of alternative energy sources for vehicles, or energy storage. To date the lead-acid accumulator has proved to be the only widely used source of energy for electrically powered vehicles. Other types of electrochemical cell, especially fuel cells, are now being developed and tested on the road. Some of the criteria for a commercially viable cell can be discussed.

At advanced level the electrode processes could be outlined in more detail as examples of redox reactions that can be reversed many times in an electrochemical cell. Although car battery testing using the density of the electrolyte has become less common, its relationship to the overall cell reactions, on charging and discharging the lead accumulator, could also be pointed out.

Time required should be 20–30 minutes, depending on how many readings are taken.

  • Eye protection
  • Beakers, 100 or 150 cm 3 , x2
  • Low voltage DC supply, 2–4 V, or suitable battery
  • Torch bulb, 1.25 V, in holder
  • Crocodile clips, x2–4, as needed
  • Connecting leads, x2
  • Stopclock or watch
  • Dilute sulfuric acid, 0.5 M (IRRITANT), about 100 cm 3
  • Lead foil electrodes (TOXIC, DANGEROUS FOR THE ENVIRONMENT) (about 2 cm x 8 cm), x2 (see note 6 below)

Health, safety and technical notes

  • Read our standard health and safety guidance.
  • Wear eye protection throughout.
  • Lead foil, Pb(s), (TOXIC, DANGEROUS FOR THE ENVIRONMENT) – see CLEAPSS Hazcard HC056 .
  • Dilute sulfuric acid, H 2 SO 4 (aq), (IRRITANT) – see CLEAPSS Hazcard  HC098a  and CLEAPSS Recipe Book RB098.
  • Lead(IV) oxide, PbO 2 (s), (TOXIC, DANGEROUS FOR THE ENVIRONMENT) is produced as a product on the (–) electrode – see CLEAPSS Hazcard HC056 .
  • The lead electrodes should be cut to size so that they can be folded over the rim of the beaker and the crocodile clips attached, so as to grip the beaker rim and the lead foil together. Care must be taken not to allow the electrodes to touch once the cell is assembled, or for the electrolyte level to bring it into contact with the crocodile clips.
  • Assemble the cell as shown in the diagram and connect it to the DC source. Note which electrode is (+) and which is (–).

A diagram showing the equipment set-up for testing a simple lead–acid accumulator cell

Source: Royal Society of Chemistry

How to assemble, charge and discharge the lead–acid accumulator cell

  • Pour sufficient dilute sulfuric acid electrolyte into the cell to fill it to within 1 cm of the crocodile clips.
  • Switch on the DC source and, if possible, adjust the voltage to 3–4 V. Allow the current to pass for three minutes.
  • Disconnect the power supply from the cell. At this point the lead electrodes may be removed for examination (demonstration mode only). One should be bright, the other covered with a dark brown deposit of lead(IV) oxide (TOXIC, DANGEROUS FOR THE ENVIRONMENT). Replace the electrodes in the electrolyte.
  • Connect the electrodes to the light bulb and start the stop clock. The bulb will light up then gradually fade. Note the time it takes to go out.
  • Replace the light bulb with the DC power source, making sure that the electrodes are connected to the same DC terminals as at the start. Pass the current for four minutes this time. Disconnect the power supply and time how long it takes to discharge the cell using the light bulb.
  • Repeat steps 3–6 a few more times, each time increasing the charging time by a minute and recording the time it takes for the cell to discharge.
  • Plot a graph of discharge time (y-axis) vs charging time (x-axis).

Teaching notes

Students should be able to identify which way electrons are flowing in the cell when it is charging and discharging from the electrode polarities. At advanced level this could be linked to the electrode reactions below, which assume an initial layer of insoluble lead(II) sulfate on the electrodes after immersing the lead in the acid.

During charging (electrode signs as in charging circuit):

(+) electrode: PbSO 4 (s) + 2H 2 O(l) → PbO 2 (s) + 4H + (aq) + SO 4 2– (aq) + 2e –

(–) electrode: PbSO 4 (s) + 2e –  → Pb(s) + SO 4 2– (aq)

Discharging (electrode signs as for cell):

(+) electrode: PbO 2 (s) + 4H + (aq) + SO 4 2– (aq) + 2e –  → PbSO 4 (s) + 2H 2 O(l)

(–) electrode: Pb(s) + SO 4 2– (aq) → PbSO 4 (s) + 2e –

The overall, reversible cell reaction is therefore:

PbO 2 (s) + 4H + (aq) + 2SO 4 2- (aq) + Pb(s) ⇌ 2PbSO 4 (s) + 2H 2 O(l)

Thus during charging the sulfuric acid concentration rises, and during discharge it falls. A side reaction which may result from over-charging is the liberation of hydrogen gas at the (–) electrode, resulting from the reduction of H + (aq) ions. This has caused explosions in the past when the electrolyte level in batteries has been investigated with the aid of a lighted match!

The advantages of this cell reaction for use in a commercial battery could be discussed, eg the formation of insoluble lead or lead compounds on the electrodes during charge and discharge, the only changes in the electrolyte being a change in concentration. Commercial cells need to be robust, cheap to construct and, for certain applications, able to sustain large currents. The lead-acid accumulator fulfils all these criteria, but has the disadvantage of being very heavy.

Additional information

This is a resource from the  Practical Chemistry project , developed by the Nuffield Foundation and the Royal Society of Chemistry.

Practical Chemistry activities accompany  Practical Physics  and  Practical Biology .

© Nuffield Foundation and the Royal Society of Chemistry

  • 11-14 years
  • 14-16 years
  • 16-18 years
  • Practical experiments
  • Electrochemistry
  • Energy storage
  • Physical chemistry

Specification

  • Electrochemical cells
  • C6.2p recall that a chemical cell produces a potential difference until the reactants are used up
  • A simple cell can be made by connecting two different metals in contact with an electrolyte.
  • Rechargeable cells and batteries can be recharged because the chemical reactions are reversed when an external electrical current is supplied.
  • 5.25C Recall that a chemical cell produces a voltage until one of the reactants is used up
  • C1.2.8 recall that a chemical cell produces a potential difference until the reactants are used up
  • 10. Investigating some electrochemical cells
  • Cells can be non-rechargeable (irreversible), rechargeable or fuel cells.
  • The benefits and risks to society associated with using these cells.
  • j) setting up of electrochemical cells and measuring voltages
  • g) the techniques and procedures used for the measurement of cell potentials of: metals or non-metals in contact with their ions in aqueous solution; ions of the same element in different oxidation states in contact with a Pt electrode

Related articles

A woman wearing a warm coat outside and holding a chemical hand warmer

Teaching enthalpy cycles at post-16

2024-09-16T05:45:00Z By Naomi Hennah

Use these tips and resources to help your students construct and interpret enthalpy cycles

Tiny people replacing a broken light bulb in an electrolytic cell

Spiral your curriculum for electrochemistry success

2024-08-13T06:05:00Z By Ian McDaid

Discover why electrochemistry doesn’t have to be a challenging topic

A person in overalls and protective gloves holding a mass of curled metal strips next to a large pile of the same metal strips

Electrolysis gets a boost from metal scraps

2024-06-21T08:00:00Z By Nina Notman

Waste metals transformed into catalysts for clean fuel production

No comments yet

Only registered users can comment on this article., more experiments.

Image showing a one page from the technician notes, teacher notes, student sheet and integrated instructions that make up this resource, plus two bags of chocolate coins

‘Gold’ coins on a microscale | 14–16 years

By Dorothy Warren and Sandrine Bouchelkia

Practical experiment where learners produce ‘gold’ coins by electroplating a copper coin with zinc, includes follow-up worksheet

potion labels

Practical potions microscale | 11–14 years

By Kirsty Patterson

Observe chemical changes in this microscale experiment with a spooky twist.

An image showing the pages available in the downloads with a water bottle in the shape of a 6 in the foreground.

Antibacterial properties of the halogens | 14–18 years

By Kristy Turner

Use this practical to investigate how solutions of the halogens inhibit the growth of bacteria and which is most effective

  • Contributors
  • Email alerts

Site powered by Webvision Cloud

X

  • Latest news
  • UCL in the media
  • Services for media
  • Student news
  • Tell us your story

Menu

Large-scale experiment brings real world into lab to design better spaces

16 September 2024

The real world was brought into the laboratory on a scale never seen before, for an experiment where over 100 people were tracked walking through a custom-built network of moveable ‘walls’, in a UCL-led research project investigating how people move through spaces.

Study participants at UCL PEARL showcase and live experiment

The project attracted participation from professionals in architecture, hospitals, transport, AI, property, video game design, dance, and museums.

The research team, led by academics from neuroscience, architecture, and civil engineering, are seeking to develop comprehensive data about how people navigate and experience spaces, that could aid the design of better buildings to improve health, learning, and living.

At a launch event and live experiment, over 100 people wearing a range of sensors walked through a maze-like environment, set up as an art gallery at UCL’s PEARL (Person Environment Activity Research Laboratory) facility, a unique space in East London created to explore how people interact with their environment.

The research project is set up to bridge the gap between tightly controlled lab experiments and field-based experiments with uncontrolled variables. The academics are hoping their findings, over the course of numerous experiments, will yield valuable insights for designing spaces such as transport hubs, hospitals, or offices, including making them more inclusive, while also informing AI and simulation software.

Lead researcher Professor Hugo Spiers (UCL Experimental Psychology) said: “To study how people navigate their environments and how their brains support this, we can do that in a research lab – but that’s not very realistic – or we can do that in the real world – but that’s harder to control or modify. Here, we are bringing the real world into the lab, in a massive space that could be set up as anything from a train station to a hospital or school, to facilitate research.”

Co-lead researcher Dr Fiona Zisch (UCL Bartlett School of Architecture) said: “When designing buildings and other spaces, you need to understand how people will move around the space, which can be surprisingly difficult in practice. Many spaces leave visitors lost, confused, or stressed, or lack accessibility for people with different mobility levels, health issues, or neurodiversity.

“If people cannot easily navigate a space, this can affect care outcomes in hospitals, efficiency in transport or logistics, or safety, particularly in the case of emergency evacuation, so it’s vital that we do more research to understand diverse requirements to make design more equitable.”

The study space, measuring 15 metres squared, has eight-metre-high curtains, acting as moveable ‘walls’, and was designed by the lead researchers alongside Professors Stephen Gage and Sean Hanna (both UCL Bartlett School of Architecture), to see how changes to the space alter how people move within it.

In the initial setup, the ‘art gallery’ includes projects on display from UCL Design for Performance and Interaction MArch students, that study participants perused at their own pace while wearing a cap containing a tracking device and a barcode for camera tracking, with the latter developed by George Profenza and Jessica In (UCL Bartlett School of Architecture). Some participants wore additional monitoring devices, such as mobile electroencephalography (EEG) systems to measure brain activity, made possible by Professor Klaus Gramann (Technische Universität Berlin). During the experiment, participants were given instructions at different points to complete tasks such as finding specific displays, congregating in groups, or evacuating the space.

UCL PEARL showcase and live experiment maze setup

Counsellor for Science and Innovation at the Embassy of Sweden in UK, Marika Amartey, spoke at the launch event about the importance of this research under a bilateral agreement between the UK and Sweden. The lead researchers, together with Carina Carlman, Director of Research and Business Development at the Research Institutes of Sweden (RISE), and neurodesigner and brain researcher Isabelle Sjövall (UCL Experimental Psychology and RISE), are establishing a new joint Centre for NeuroArchitecture and NeuroDesign.* This new centre will explore how the human brain interacts with built environments, and how understanding this can help design spaces that enhance people’s health and wellbeing.

The experiment at UCL PEARL forms an important step in the creation of this new Centre. Isabelle Sjövall said: “Our research at PEARL is important because it can generate highly rigorous new discoveries that change policies guiding the design for healthy cities.”

Carina Carlman added: “We are delighted at RISE to bring our experience with research and innovation to together with UCL to tackle some of the most important challenges we face in society.”  

The project received funding from the UK Government Higher Education Innovation Fund, alongside technical equipment supplied and run by Ubisense, Pupil Labs, Brain Products, and Artinis . Support for the launch event came from sustainable development consultancy Arup, who are exploring potentials in using the research to help design more accessible and inclusive spaces for business, governments, and other clients.

Brett Little, Arup’s People Movement Leader, said: "There are countless problems different people might experience when trying to find their way around anything from a train station to a museum. At Arup we're excited that this research will help us make huge leaps forward in understanding how to design buildings, cities, and spaces that solve those problems.

"This is the start of a journey that will take our understanding of how people move in the real world to another level and enable us to help create spaces that are accessible and work for all - no matter their background."

Professor Nick Tyler (UCL Civil, Environmental & Geomatic Engineering), Director of UCL PEARL, commented: “We built UCL PEARL with a vision to create a better world with infrastructure that works for everyone, by facilitating research that fuses arts and sciences, and cuts across disciplines and sectors. By recreating large spaces like train stations, hospital wards, and town centres, but also trains, buses, streets, parks, supermarkets, concert halls, or theatres, and modifying them systematically, we can investigate people’s reactions to them in detail to transform research in design, engineering, and neuroscience, through a greater understanding of the brain in its ecological world.”

  • Professor Hugo Spiers’ academic profile
  • Dr Fiona Zisch’s academic profile
  • UCL Experimental Psychology
  • UCL Psychology & Language Sciences
  • UCL Bartlett School of Architecture
  • * More about UCL and RISE collaboration in NeuroDesign and NeuroArchitecture

Photos of the live experiment which took place on Wednesday 11 September. Credit: Sandra Ciampone.

Media contact

tel: +44 20 7679 9222  / +44 (0) 7717 728648

E: chris.lane [at] ucl.ac.uk

UCL Facebook page

IMAGES

  1. Interesting Light Bulb Experiment Using Pencil Lead

    experiments with lead

  2. Graphite pencil lead resistance experiment

    experiments with lead

  3. Lead Experiment || Lead Reaction with Heat and Fire🔥|| What is the Lead???

    experiments with lead

  4. HowTo Make Bullets? Make it Rain Molten Lead « Science Experiments

    experiments with lead

  5. Electrolysis of molten lead(II) bromide

    experiments with lead

  6. Lead nitrate

    experiments with lead

VIDEO

  1. Did Forbidden Experiments Lead to Their Mysterious Deaths? #Shorts

  2. 7 Fun Copper and Neodymium Experiments

  3. inside Lead Acid Battery🔋| #shorts #science #experiment

  4. Homemade 12.6v 15Ah lithium ion battery #lithiumbatterypack #lithiumionbattery #shorts #viralvideo

  5. Свинец. Разбираем аккумуляторную батарею для ИБП

  6. Lead Thru Experiments

COMMENTS

  1. Golden Rain Experiment

    Lead Nitrate + Potassium Iodide. Lead nitrate reacts with potassium iodide to produce a beautiful precipitate, as we will show you. The reaction, known as the "Golden Rain" experiment, produces beautiful hexagonal crystals of lead iodide that resemble plates of gold, and makes a great chemistry demonstration.. The golden rain reaction takes advantage of the increased solubility of lead ...

  2. Precipitation reactions of lead nitrate

    Compare the colours of lead compounds formed by precipitation reactions to identify which would make good pigments in this microscale class practical. Many lead compounds are insoluble and some of them are brightly coloured. In this experiment, students observe the colour changes of lead nitrate solutions when different anions are added to ...

  3. Making solder as an alloy of tin and lead

    In this experiment, students make their own alloy, heating lead and tin together to produce solder. They then investigate three properties of the alloy and compare these with lead, including hardness, melting point and density. The most likely incident in this experiment is a student burning themselves, so warn them about the equipment being hot.

  4. Golden rain

    Safety. Wear eye protection. Wash hands thoroughly after the demonstration. After using lead salts, wipe up any spills and wipe over surfaces. Lead nitrate is harmful if swallowed and inhaled, may damage the unborn child, is suspected of damaging fertility, may cause damage to organs though prolonged or repeated exposure, and is very toxic to aquatic life with long-lasting effects.

  5. Explore the Effects of pH on Lead Testing.

    lead + sulfide → lead sulfide; Pb 2+ = Lead, in the form of a positive ion with a charge of 2+; S 2-= Sulfide, in the form of a negative ion with a charge of 2-; PbS = Lead sulfide, the reaction product. Lead sulfide is a black solid. (aq) = Aqueous, or dissolved in water (s) = Solid The kit you will use has a color key that shows the approximate concentration of lead in the sample, based on ...

  6. "Diffusion: mysterious movement" experiment

    Step-by-step in­struc­tions. Pour a cup of wa­ter into the glass dish. In­tro­duce the lead ni­trate and potas­si­um io­dide to the wa­ter on op­po­site sides of the dish from one an­oth­er. In a minute, the salts will dis­solve and a yel­low pre­cip­i­tate will form a stripe in the cen­ter of the dish.

  7. Heavy Metals and Aquatic Environments

    Lead is called a heavy metal, and there are other sources of heavy metals that can be toxic, too. Silver, copper, mercury, nickel, cadmium, arsenic, and chromium are all heavy metals that can be toxic in certain environments. In this experiment, find out if one common heavy metal, copper, can be toxic to an aquatic environment.

  8. Lemon Battery Experiment

    Use a lemon battery to power a small electrical device, like an LED. The lemon battery experiment is a classic science project that illustrates an electrical circuit, electrolytes, the electrochemical series of metals, and oxidation-reduction (redox) reactions.The battery produces enough electricity to power an LED or other small device, but not enough to cause harm, even if you touch both ...

  9. Experimental investigation on the interaction characteristics of lead

    However, such experiments can hardly reflect the real process of SGTR accident in lead-based reactors. The second category is the large-scale mechanistic experiments represented by the LIFUS5 carried out by ENEA [ 21 , 27 , 28 ], which can better simulate the process of SGTR accidents in lead-based reactors and have essential reference ...

  10. Controlled experiments (article)

    Controlled experiments (article) | Khan Academy. If you're seeing this message, it means we're having trouble loading external resources on our website. If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked. Khanmigo is now free for all US educators! Plan lessons, develop exit tickets ...

  11. How to test for lead and nitrates in water

    1. Since this is a science fair experiment, you will need only common chemicals to test the contaminants in water. You can use hydrogen sulfide, H2S H X S to test lead. (Warning: it has rotten egg smell.) It is a common reagent and can be found in laboratory. You can also use sodium sulfide instead but it too has rotten egg smell. X + + + X +.

  12. Investigating the solubility of lead halides

    Pb 2+ (aq) + 2I - (aq) →PbI 2 (s) and the full equation: Pb (NO 3) 2 (aq) + 2KI (aq) →PbI 2 (s) + 2KNO 3 (aq) The solubilities of the lead halides increase markedly with temperature, so that the three halides under investigation are all effectively soluble in boiling water. This means that on cooling these solutions, the lead halides will ...

  13. Rutherford scattering experiments

    The experiments were performed between 1906 and 1913 by Hans Geiger and Ernest Marsden under the direction of Ernest Rutherford at the Physical Laboratories of the University of Manchester. The physical phenomenon was explained by Ernest Rutherford in a classic 1911 paper that eventually lead to the widespread use of scattering in particle ...

  14. 16 Science Projects and Lessons About Visible Light

    The free STEM projects, experiments, lessons and activities below help educators teach K-12 students about the physics of light, specifically, visible light, with hands-on exploration and active learning. The resources below have been grouped by grade band to help educators select the experiments and lessons that best fit their needs.Note

  15. Probing the Skin of a Lead Nucleus

    Now, the Lead Radius Experiment (PREX) Collaboration at the Thomas Jefferson National Accelerator Facility in Virginia has determined the thickness of this neutron-rich skin in lead-208, a stable isotope with 44 more neutrons than protons . The measurement, which addresses questions relating to all four fundamental forces of nature, yields ...

  16. Water-cleanup experiment caused lead poisoning

    Water-cleanup experiment caused lead poisoning. Lead concentrations spiked in many children living in the nation's capital after the local water authority altered the treatment used to disinfect ...

  17. John Dalton

    Birth date: September 6, 1766. Birth City: Eaglesfield. Birth Country: United Kingdom. Gender: Male. Best Known For: Chemist John Dalton is credited with pioneering modern atomic theory. He was ...

  18. An overview of experiments with lead-containing nanoparticles performed

    Abstract. Over the past few years, the Ekaterinburg (Russia) interdisciplinary nanotoxicological research team has carried out a series of investigations using different in vivo and in vitro experimental models in order to elucidate the cytotoxicity and organ-systemic and organism-level toxicity of lead-containing nanoparticles (NP) acting separately or in combinations with some other metallic ...

  19. Diffusion in liquids

    In this experiment, students place colourless crystals of lead nitrate and potassium iodide at opposite sides of a Petri dish of deionised water. As these substances dissolve and diffuse towards each other, students can observe clouds of yellow lead iodide forming, demonstrating that diffusion has taken place.

  20. Improving rigor and reproducibility in western blot experiments with

    Here we describe best practices for the design and analysis of western blot experiments, with examples and demonstrations of how different analytical approaches can lead to widely varying outcomes ...

  21. This is what you need to know about lead and your health

    Children that survive severe lead poisoning may end up with lifelong intellectual impairments and behavioral disorders. Even low levels of exposure are known to reduce IQ and produce learning ...

  22. A standardized framework to test event-based experiments

    The replication crisis in experimental psychology and neuroscience has received much attention recently. This has led to wide acceptance of measures to improve scientific practices, such as preregistration and registered reports. Less effort has been devoted to performing and reporting the results of systematic tests of the functioning of the experimental setup itself. Yet, inaccuracies in the ...

  23. Silver and lead halides

    In this experiment, students add silver and lead salts to a variety of solutions containing halide ions, producing insoluble silver and lead halides as precipitates. The silver chloride, bromide and iodide can be distinguished by their colours and their solubility in ammonia solution, providing tests for these halide ions in solution. ...

  24. Book Review: How One Weird Rodent Ecologist Tried to Change the Fate of

    Calhoun, an endlessly inventive designer of experiments, built an enclosure outfitted with rat apartments and partitioned the pen into connected "neighborhoods," creating a murid arcadia that ...

  25. Rechargeable cells: the lead-acid accumulator

    The most common type of heavy duty rechargeable cell is the familiar lead-acid accumulator ('car battery') found in most combustion-engined vehicles. This experiment can be used as a class practical or demonstration. Students learn how to construct a simple lead-acid cell consisting of strips of lead and an electrolyte of dilute sulfuric ...

  26. Large-scale experiment brings real world into lab to design ...

    The real world was brought into the laboratory on a scale never seen before, for an experiment where over 100 people were tracked walking through a custom-built network of moveable 'walls', ... Lead researcher Professor Hugo Spiers (UCL Experimental Psychology) said: "To study how people navigate their environments and how their brains ...