Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Foundations
  • Measurement
  • Internal Validity
  • Introduction to Design
  • Types of Designs
  • Two-Group Experimental Designs
  • Defining Experimental Designs
  • Factorial Designs
  • Randomized Block Designs
  • Covariance Designs

Hybrid Experimental Designs

  • Quasi-Experimental Design
  • Pre-Post Design Relationships
  • Designing Designs for Research
  • Quasi-Experimentation Advances
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

Hybrid experimental designs are just what the name implies ⁠— new strains that are formed by combining features of more established designs. There are lots of variations that could be constructed from standard design features. Here, I’m going to introduce two hybrid designs. I’m featuring these because they illustrate especially well how a design can be constructed to address specific threats to internal validity.

The Solomon Four-Group Design

The Solomon Four-Group Design is designed to deal with a potential testing threat . Recall that a testing threat occurs when the act of taking a test affects how people score on a retest or posttest. The design notation is shown in the figure. It’s probably not a big surprise that this design has four groups. Note that two of the groups receive the treatment and two do not. Further, two of the groups receive a pretest and two do not. One way to view this is as a 2x2 (Treatment Group X Measurement Group) factorial design . Within each treatment condition we have a group that is pretested and one that is not. By explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.

Possible Outcomes. Let’s look at a couple of possible outcomes from this design. The first outcome graph shows what the data might look like if there is a treatment or program effect and there is no testing threat. You need to be careful in interpreting this graph to note that there are six dots – one to represent the average for each O in the design notation. To help you visually see the connection between the pretest and posttest average for the same group, a line is used to connect the dots. The two dots that are not connected by a line represent the two post-only groups. Look first at the two pretest means. They are close to each because the groups were randomly assigned. On the posttest, both treatment groups outscored both controls. Now, look at the posttest values. There appears to be no difference between the treatment groups, even though one got a pretest and the other did not. Similarly, the two control groups scored about the same on the posttest. Thus, the pretest did not appear to affect the outcome. But both treatment groups clearly outscored both controls. There is a main effect for the treatment.

Now, look at a result where there is evidence of a testing threat. In this outcome, the pretests are again equivalent (because the groups were randomly assigned). Each treatment group outscored it’s comparable control group. The pre-post treatment outscored the pre-post control. And, the post-only treatment outscored the post-only control. These results indicate that there is a treatment effect. But here, both groups that had the pretest outscored their comparable non-pretest group. That’s evidence for a testing threat.

Switching Replications Design

The Switching Replications design is one of the strongest of the experimental designs. And, when the circumstances are right for this design, it addresses one of the major problems in experimental designs ⁠— the need to deny the program to some participants through random assignment. The design notation indicates that this is a two group design with three waves of measurement. You might think of this as two pre-post treatment-control designs grafted together. That is, the implementation of the treatment is repeated or replicated . And in the repetition of the treatment, the two groups switch roles – the original control group becomes the treatment group in phase 2 while the original treatment acts as the control. By the end of the study all participants have received the treatment.

The switching replications design is most feasible in organizational contexts where programs are repeated at regular intervals. For instance, it works especially well in schools that are on a semester system. All students are pretested at the beginning of the school year. During the first semester, Group 1 receives the treatment and during the second semester Group 2 gets it. The design also enhances organizational efficiency in resource allocation. Schools only need to allocate enough resources to give the program to half of the students at a time.

Possible Outcomes. Let’s look at two possible outcomes. In the first example, we see that when the program is given to the first group, the recipients do better than the controls. In the second phase, when the program is given to the original controls, they “catch up” to the original program group. Thus, we have a converge, diverge, reconverge outcome pattern. We might expect a result like this when the program covers specific content that the students master in the short term and where we don’t expect that they will continue getting better as a result.

Now, look at the other example result. During the first phase we see the same result as before ⁠— the program group improves while the control does not. And, as before, during the second phase we see the original control group, now the program group, improve as much as did the first program group. But now, during phase two, the original program group continues to increase even though the program is no longer being given them. Why would this happen? It could happen in circumstances where the program has continuing and longer term effects. For instance, if the program focused on learning skills, students might continue to improve even after the formal program period because they continue to apply the skills and improve in them.

I said at the outset that both the Solomon Four-Group and the Switching Replications designs addressed specific threats to internal validity. It’s obvious that the Solomon design addressed a testing threat. But what does the switching replications design address? Remember that in randomized experiments, especially when the groups are aware of each other, there is the potential for social threats ⁠— compensatory rivalry, compensatory equalization and resentful demoralization are all likely to be present in educational contexts where programs are given to some students and not to others. The switching replications design helps mitigate these threats because it assures that everyone will eventually get the program. And, it allocates who gets the program first in the fairest possible manner, through the lottery of random assignment.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Hybrid methods for combined experimental and computational determination of protein structure

Affiliation.

  • 1 Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA.
  • PMID: 33380110
  • PMCID: PMC7773420
  • DOI: 10.1063/5.0026025

Knowledge of protein structure is paramount to the understanding of biological function, developing new therapeutics, and making detailed mechanistic hypotheses. Therefore, methods to accurately elucidate three-dimensional structures of proteins are in high demand. While there are a few experimental techniques that can routinely provide high-resolution structures, such as x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-EM, which have been developed to determine the structures of proteins, these techniques each have shortcomings and thus cannot be used in all cases. However, additionally, a large number of experimental techniques that provide some structural information, but not enough to assign atomic positions with high certainty have been developed. These methods offer sparse experimental data, which can also be noisy and inaccurate in some instances. In cases where it is not possible to determine the structure of a protein experimentally, computational structure prediction methods can be used as an alternative. Although computational methods can be performed without any experimental data in a large number of studies, inclusion of sparse experimental data into these prediction methods has yielded significant improvement. In this Perspective, we cover many of the successes of integrative modeling, computational modeling with experimental data, specifically for protein folding, protein-protein docking, and molecular dynamics simulations. We describe methods that incorporate sparse data from cryo-EM, NMR, mass spectrometry, electron paramagnetic resonance, small-angle x-ray scattering, Förster resonance energy transfer, and genetic sequence covariation. Finally, we highlight some of the major challenges in the field as well as possible future directions.

PubMed Disclaimer

Representations of each featured experimental…

Representations of each featured experimental method used for computational modeling. In this Perspective,…

Comparison of the utility of…

Comparison of the utility of different types of information (green: contacts; orange: interface;…

Improvement in fit to the…

Improvement in fit to the density map using MDFF for acethyl-CaA synthase. Target…

NMR restraints improved native-like sampling…

NMR restraints improved native-like sampling in BCL. Each point signifies one protein. Points…

MELDxMD was the highest ranked…

MELDxMD was the highest ranked group in NMR data-assisted CASP13 (2018). (a) MELDxMD…

The inclusion of HRF data…

The inclusion of HRF data improved structure prediction for myoglobin. Top shows score…

Comparison of predicted subcomplexes with…

Comparison of predicted subcomplexes with (left, blue) and without (right, red) the inclusion…

Comparison of sampling for Bax…

Comparison of sampling for Bax and ExoU de novo folding using DEER data…

Docked structures generated using ATTRACT-SAXS…

Docked structures generated using ATTRACT-SAXS for an easy [(a) 2GTP], medium [(b) 1B6C],…

Benchmark of FRET-based modeling in…

Benchmark of FRET-based modeling in the integrative modeling platform. The accuracy (average C-α…

Performance of AlphaFold (A7D) in…

Performance of AlphaFold (A7D) in CASP13 (2018). Number of free modeling domains predicted…

(a) Comparison of models for…

(a) Comparison of models for Afp7. The Foldit structure is rendered in green,…

Similar articles

  • Integrative computational modeling of protein interactions. Rodrigues JP, Bonvin AM. Rodrigues JP, et al. FEBS J. 2014 Apr;281(8):1988-2003. doi: 10.1111/febs.12771. Epub 2014 Mar 26. FEBS J. 2014. PMID: 24588898 Review.
  • Bayesian Modeling of Biomolecular Assemblies with Cryo-EM Maps. Habeck M. Habeck M. Front Mol Biosci. 2017 Mar 22;4:15. doi: 10.3389/fmolb.2017.00015. eCollection 2017. Front Mol Biosci. 2017. PMID: 28382301 Free PMC article.
  • Protein-RNA interactions: structural biology and computational modeling techniques. Jones S. Jones S. Biophys Rev. 2016 Dec;8(4):359-367. doi: 10.1007/s12551-016-0223-9. Epub 2016 Nov 14. Biophys Rev. 2016. PMID: 28510023 Free PMC article. Review.
  • 3D Computational Modeling of Proteins Using Sparse Paramagnetic NMR Data. Pilla KB, Otting G, Huber T. Pilla KB, et al. Methods Mol Biol. 2017;1526:3-21. doi: 10.1007/978-1-4939-6613-4_1. Methods Mol Biol. 2017. PMID: 27896733
  • Advances in integrative structural biology: Towards understanding protein complexes in their cellular context. Ziegler SJ, Mallinson SJB, St John PC, Bomble YJ. Ziegler SJ, et al. Comput Struct Biotechnol J. 2020 Dec 3;19:214-225. doi: 10.1016/j.csbj.2020.11.052. eCollection 2021. Comput Struct Biotechnol J. 2020. PMID: 33425253 Free PMC article. Review.
  • Multimodal Mass Spectrometry Identifies a Conserved Protective Epitope in S. pyogenes Streptolysin O. Tang D, Gueto-Tettay C, Hjortswang E, Ströbaek J, Ekström S, Happonen L, Malmström L, Malmström J. Tang D, et al. Anal Chem. 2024 Jun 4;96(22):9060-9068. doi: 10.1021/acs.analchem.4c00596. Epub 2024 May 3. Anal Chem. 2024. PMID: 38701337 Free PMC article.
  • Scipion3: A workflow engine for cryo-electron microscopy image processing and structural biology. Conesa P, Fonseca YC, Jiménez de la Morena J, Sharov G, de la Rosa-Trevín JM, Cuervo A, García Mena A, Rodríguez de Francisco B, Del Hoyo D, Herreros D, Marchan D, Strelak D, Fernández-Giménez E, Ramírez-Aportela E, de Isidro-Gómez FP, Sánchez I, Krieger J, Vilas JL, Del Cano L, Gragera M, Iceta M, Martínez M, Losana P, Melero R, Marabini R, Carazo JM, Sorzano COS. Conesa P, et al. Biol Imaging. 2023 Jun 29;3:e13. doi: 10.1017/S2633903X23000132. eCollection 2023. Biol Imaging. 2023. PMID: 38510163 Free PMC article.
  • An automated pipeline integrating AlphaFold 2 and MODELLER for protein structure prediction. Gil Zuluaga FH, D'Arminio N, Bardozzo F, Tagliaferri R, Marabotti A. Gil Zuluaga FH, et al. Comput Struct Biotechnol J. 2023 Nov 3;21:5620-5629. doi: 10.1016/j.csbj.2023.10.056. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 38047234 Free PMC article.
  • SARS-CoV-2 proteins structural studies using synchrotron radiation. Kosenko M, Onkhonova G, Susloparov I, Ryzhikov A. Kosenko M, et al. Biophys Rev. 2023 Sep 29;15(5):1185-1194. doi: 10.1007/s12551-023-01153-7. eCollection 2023 Oct. Biophys Rev. 2023. PMID: 37974992 Review.
  • Exploring the World of Membrane Proteins: Techniques and Methods for Understanding Structure, Function, and Dynamics. Boulos I, Jabbour J, Khoury S, Mikhael N, Tishkova V, Candoni N, Ghadieh HE, Veesler S, Bassim Y, Azar S, Harb F. Boulos I, et al. Molecules. 2023 Oct 19;28(20):7176. doi: 10.3390/molecules28207176. Molecules. 2023. PMID: 37894653 Free PMC article. Review.
  • Leelananda S. P. and Lindert S., “Computational methods in drug discovery,” Beilstein J. Org. Chem. 12, 2694–2718 (2016).10.3762/bjoc.12.267 - DOI - PMC - PubMed
  • Nwanochie E. and Uversky V. N., “Structure determination by single-particle cryo-electron microscopy: Only the sky (and intrinsic disorder) is the limit,” Int. J. Mol. Sci. 20(17), 4186 (2019).10.3390/ijms20174186 - DOI - PMC - PubMed
  • Würz J. M., Kazemi S., Schmidt E., Bagaria A., and Güntert P., “NMR-based automated protein structure determination,” Arch. Biochem. Biophys. 628, 24–32 (2017).10.1016/j.abb.2017.02.011 - DOI - PubMed
  • Ilari A. and Savino C., “Protein structure determination by x-ray crystallography,” Methods Mol. Biol. 452, 63–87 (2008).10.1007/978-1-60327-159-2_3 - DOI - PubMed
  • Overall Growth of Released Structures per Year, RCSB, 2020.

Publication types

  • Search in MeSH

Related information

Grants and funding.

  • P41 GM128577/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • PubMed Central
  • Silverchair Information Systems

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

GutCheck

Hybrid Research: Combining Qualitative and Quantitative Methods and More

Apr 3, 2019

Updated April 3rd, 2019

Research has long since evolved beyond the tried-and-true method of conducting an exploratory qualitative research phase followed by a quantitative phase. Digital research methods have allowed for the expansion of new and creative means of market research. With that said, hybrid research (sometimes referred to as mixed-method, bricolage, or triangulation) is not new. But some still forget to take full advantage of what hybrid research has to offer.

As we know, many methods of research are designed to answer specific questions. This is normally necessary in order to get results that are specific and actionable. However, sometimes combining methods and/or conducting research iteratively is a better option. And forgoing the opportunity completely could mean missing out on an easy approach to all-encompassing, collaborative insights.

What Is Hybrid Research?

You can likely take a guess at the definition behind hybrid research. When the term first emerged, most research experts defined it as a combination of qualitative and quantitative research methods. Now it is more evolved than that. Hybrid research can be a combination of two or more research methodologies—regardless of whether it’s qualitative plus quantitative. Further, it can be conducted in a series (iteratively) or in parallel (at the same time).

The motivation for using hybrid research is to establish a better understanding of results.  For example, quantitative research can define “what” and qualitative research can provide the “why.” Nowadays, hybrid research can expand beyond just qualitative and quantitative research combinations, such as in-person plus digital methods, or customer reviews plus quantitative research. Some of the benefits of hybrid research include

  • Relatable insights that can be tied from one question or phase of research to the next, creating more meaningful connections that can also inform future research and product/messaging optimizations
  • Timely and cost-effective results—especially if the combination of methods is being conducted concurrently
  • A more engaging and impactful story, as more data sources are being used to supplement answers and add in additional layers of consumer experiences and understanding

The Differences Between Quantitative and Qualitative Research

Even though hybrid research can, as mentioned above, encompass several different research methods and isn’t necessarily always a combination of quantitative and qualitative research, combining these two methodologies is very common when it comes to painting a fuller picture of a target audience or  segment . Let’s talk a bit about the basic differences between quantitative and qualitative research, which can help you make decisions about which approach to take based on your business and research objectives.

Quantitative Research

Quantitative research uses larger data sets to confidently answer any combination of  who, what, when, or how .

When it comes to analysis, the first step is realizing that analysis comes in two phases – before and after the research is conducted. Prior to conducting the research, you need to think through what  key questions  you want to answer and what form the data needs to be in to correctly answer those questions.

You’ll also want to make sure you’ve identified the right metrics that will actually answer your research objectives and give you the data you need to make those confident decisions. For example, three fundamental metrics we find to be most effective in quantitatively testing products include purchase intent, uniqueness, and believability. After conducting analysis on the various metrics we use, including correlation analyses, we found the combination of these metrics to be the least related, or in other words, more descriptive when it comes to varying aspects of a product or claim.

Qualitative Research

When you  ask the right questions , qualitative research explains the why—both without and/or behind—the numbers, giving context for and bringing to life otherwise one-dimensional data.

Once you are engaged in actual consumer stories, understanding how the business decisions you make will impact your consumers becomes easier and clearer. The story aspect of qualitative research allows the data to come to life and transcend beyond the immediate results. For instance, consider hearing a data point that females aged 25–40 who make less than $20,000 a year, with kids under the age of 10, are more likely to purchase Product A over Product B. What does that mean for your company? How should you change your marketing to reflect that data point? Does it make sense to even change your marketing? The real question is, “Why do they prefer Product A over Product B?” in terms of qualitative data.

With qualitative research, you must be careful not to take one quote from a respondent to be the ultimate truth. Quotes are great to add color, explain further, and provide specific examples. But they are not meant to be the be-all and end-all of the research. They should be looked at in their entirety with the research, not individually.

Using Quantitative & Qualitative Research Together

At the highest level, combining quantitative and qualitative research methods allows you to narrow your focus so you can make smarter decisions, using multiple data types/points, and focus your development or optimization efforts where you’ll have the most impact. The hardest part of conducting quantitative research is determining the underlying cause for why numbers result in a particular way. Utilizing qualitative research with quantitative provides the lens to create impactful key findings. It can also be helpful to use qualitative research prior to quantitative to inform answer choices for a quantitative  attitudes and usage study  for example. Doing so ensures you’ve got the right exhaustive list of answers, which allows respondents to choose answers that are more relevant. Ultimately, the relationship between data points and consumer stories can culminate into the challenges and/or opportunities that should be addressed with action.

Examples of Quantitative and Qualitative Research by Objectives

Quantitative: attitudes and usage, prioritizing concepts or claims, prioritizing features for a product, evaluating prices, line optimizations, competitive analysis

Qualitative: exploratory research, refining concepts or claims, refining messaging/creative, mobile shop-alongs.

The Research Design

Before you start designing a hybrid research study, you first have to define all objectives related to the business need. A use-case could start with the need to prioritize and develop a concept for new product development. From there, you’d then have to assess the objectives by research method, such as prioritizing concepts through a quantitative phase and refining them further in a qualitative phase. Lastly, determine the order or type of execution of the research. In other words, should both phases be executed together, or one before the other? Then, identify if you want to refine all concepts before prioritization or prioritize and refine only the winning concept.

Before fully designing and implementing a hybrid research design, keep in mind these best practices:

  • Objectives should start at a high level before determining what answers you should get from each phase of research
  • Avoid conducting both methods of research with different vendors, as methodologies and quality of insights can vary and impact results
  • Keep audience types consistent so insights can be translated across both phases
  • Remember, hybrid research doesn’t have to be conducted at the same time or immediately following a previous phase; sometimes it’s just about the ability to apply it in an iterative approach

We incorporate several different options for hybrid research, from combining our Exploratory Research Group™ with an Agile A&U™ to a concept or creative test followed by a refiner. Some next steps for hybrid research also include  combining big data with survey data . To see an example of how a product innovation team at Nestle uses hybrid research in an iterative way to achieve  product innovation success , take a look at the case study below.

Want to stay up to date latest GutCheck blog posts?

Follow us on, check out our most recent blog posts.

When Vocation and Avocation Collide

When Vocation and Avocation Collide

Jun 13, 2023

At GutCheck, we have four brand pillars upon which we build our business. One of those is to 'lead...

Reflections on Season 1 of Gutsiest Brands

Reflections on Season 1 of Gutsiest Brands

Jun 1, 2023

Understanding people is at the heart of market research. Sure, companies want to know what ideas...

Permission to Evolve with Miguel Garcia Castillo

Permission to Evolve with Miguel Garcia Castillo

Apr 20, 2023

(highlights from Episode #22 of the Gutsiest Brands podcast) Check out the latest lessons from our...

Why GutCheck?

  • Explore and Strategize
  • Build Products and Ideas
  • Promote and Communicate
  • Brand Measurement and Tracking
  • Case Studies
  • Infographics
  • Research Reports
  • Press Releases
  • Gutsiest Brands

1-877-990-8111 [email protected]

© 2023 GutCheck is a registered trademark of Brainyak, Inc. All rights reserved.

Privacy Policy , Cookies , & Acceptable Use , Do Not Sell My Information

© 2020 GutCheck is a registered trademark of Brainyak, Inc. All rights reserved.

Privacy Policy , Cookies , & Acceptable Use

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How the Experimental Method Works in Psychology

sturti/Getty Images

The Experimental Process

Types of experiments, potential pitfalls of the experimental method.

The experimental method is a type of research procedure that involves manipulating variables to determine if there is a cause-and-effect relationship. The results obtained through the experimental method are useful but do not prove with 100% certainty that a singular cause always creates a specific effect. Instead, they show the probability that a cause will or will not lead to a particular effect.

At a Glance

While there are many different research techniques available, the experimental method allows researchers to look at cause-and-effect relationships. Using the experimental method, researchers randomly assign participants to a control or experimental group and manipulate levels of an independent variable. If changes in the independent variable lead to changes in the dependent variable, it indicates there is likely a causal relationship between them.

What Is the Experimental Method in Psychology?

The experimental method involves manipulating one variable to determine if this causes changes in another variable. This method relies on controlled research methods and random assignment of study subjects to test a hypothesis.

For example, researchers may want to learn how different visual patterns may impact our perception. Or they might wonder whether certain actions can improve memory . Experiments are conducted on many behavioral topics, including:

The scientific method forms the basis of the experimental method. This is a process used to determine the relationship between two variables—in this case, to explain human behavior .

Positivism is also important in the experimental method. It refers to factual knowledge that is obtained through observation, which is considered to be trustworthy.

When using the experimental method, researchers first identify and define key variables. Then they formulate a hypothesis, manipulate the variables, and collect data on the results. Unrelated or irrelevant variables are carefully controlled to minimize the potential impact on the experiment outcome.

History of the Experimental Method

The idea of using experiments to better understand human psychology began toward the end of the nineteenth century. Wilhelm Wundt established the first formal laboratory in 1879.

Wundt is often called the father of experimental psychology. He believed that experiments could help explain how psychology works, and used this approach to study consciousness .

Wundt coined the term "physiological psychology." This is a hybrid of physiology and psychology, or how the body affects the brain.

Other early contributors to the development and evolution of experimental psychology as we know it today include:

  • Gustav Fechner (1801-1887), who helped develop procedures for measuring sensations according to the size of the stimulus
  • Hermann von Helmholtz (1821-1894), who analyzed philosophical assumptions through research in an attempt to arrive at scientific conclusions
  • Franz Brentano (1838-1917), who called for a combination of first-person and third-person research methods when studying psychology
  • Georg Elias Müller (1850-1934), who performed an early experiment on attitude which involved the sensory discrimination of weights and revealed how anticipation can affect this discrimination

Key Terms to Know

To understand how the experimental method works, it is important to know some key terms.

Dependent Variable

The dependent variable is the effect that the experimenter is measuring. If a researcher was investigating how sleep influences test scores, for example, the test scores would be the dependent variable.

Independent Variable

The independent variable is the variable that the experimenter manipulates. In the previous example, the amount of sleep an individual gets would be the independent variable.

A hypothesis is a tentative statement or a guess about the possible relationship between two or more variables. In looking at how sleep influences test scores, the researcher might hypothesize that people who get more sleep will perform better on a math test the following day. The purpose of the experiment, then, is to either support or reject this hypothesis.

Operational definitions are necessary when performing an experiment. When we say that something is an independent or dependent variable, we must have a very clear and specific definition of the meaning and scope of that variable.

Extraneous Variables

Extraneous variables are other variables that may also affect the outcome of an experiment. Types of extraneous variables include participant variables, situational variables, demand characteristics, and experimenter effects. In some cases, researchers can take steps to control for extraneous variables.

Demand Characteristics

Demand characteristics are subtle hints that indicate what an experimenter is hoping to find in a psychology experiment. This can sometimes cause participants to alter their behavior, which can affect the results of the experiment.

Intervening Variables

Intervening variables are factors that can affect the relationship between two other variables. 

Confounding Variables

Confounding variables are variables that can affect the dependent variable, but that experimenters cannot control for. Confounding variables can make it difficult to determine if the effect was due to changes in the independent variable or if the confounding variable may have played a role.

Psychologists, like other scientists, use the scientific method when conducting an experiment. The scientific method is a set of procedures and principles that guide how scientists develop research questions, collect data, and come to conclusions.

The five basic steps of the experimental process are:

  • Identifying a problem to study
  • Devising the research protocol
  • Conducting the experiment
  • Analyzing the data collected
  • Sharing the findings (usually in writing or via presentation)

Most psychology students are expected to use the experimental method at some point in their academic careers. Learning how to conduct an experiment is important to understanding how psychologists prove and disprove theories in this field.

There are a few different types of experiments that researchers might use when studying psychology. Each has pros and cons depending on the participants being studied, the hypothesis, and the resources available to conduct the research.

Lab Experiments

Lab experiments are common in psychology because they allow experimenters more control over the variables. These experiments can also be easier for other researchers to replicate. The drawback of this research type is that what takes place in a lab is not always what takes place in the real world.

Field Experiments

Sometimes researchers opt to conduct their experiments in the field. For example, a social psychologist interested in researching prosocial behavior might have a person pretend to faint and observe how long it takes onlookers to respond.

This type of experiment can be a great way to see behavioral responses in realistic settings. But it is more difficult for researchers to control the many variables existing in these settings that could potentially influence the experiment's results.

Quasi-Experiments

While lab experiments are known as true experiments, researchers can also utilize a quasi-experiment. Quasi-experiments are often referred to as natural experiments because the researchers do not have true control over the independent variable.

A researcher looking at personality differences and birth order, for example, is not able to manipulate the independent variable in the situation (personality traits). Participants also cannot be randomly assigned because they naturally fall into pre-existing groups based on their birth order.

So why would a researcher use a quasi-experiment? This is a good choice in situations where scientists are interested in studying phenomena in natural, real-world settings. It's also beneficial if there are limits on research funds or time.

Field experiments can be either quasi-experiments or true experiments.

Examples of the Experimental Method in Use

The experimental method can provide insight into human thoughts and behaviors, Researchers use experiments to study many aspects of psychology.

A 2019 study investigated whether splitting attention between electronic devices and classroom lectures had an effect on college students' learning abilities. It found that dividing attention between these two mediums did not affect lecture comprehension. However, it did impact long-term retention of the lecture information, which affected students' exam performance.

An experiment used participants' eye movements and electroencephalogram (EEG) data to better understand cognitive processing differences between experts and novices. It found that experts had higher power in their theta brain waves than novices, suggesting that they also had a higher cognitive load.

A study looked at whether chatting online with a computer via a chatbot changed the positive effects of emotional disclosure often received when talking with an actual human. It found that the effects were the same in both cases.

One experimental study evaluated whether exercise timing impacts information recall. It found that engaging in exercise prior to performing a memory task helped improve participants' short-term memory abilities.

Sometimes researchers use the experimental method to get a bigger-picture view of psychological behaviors and impacts. For example, one 2018 study examined several lab experiments to learn more about the impact of various environmental factors on building occupant perceptions.

A 2020 study set out to determine the role that sensation-seeking plays in political violence. This research found that sensation-seeking individuals have a higher propensity for engaging in political violence. It also found that providing access to a more peaceful, yet still exciting political group helps reduce this effect.

While the experimental method can be a valuable tool for learning more about psychology and its impacts, it also comes with a few pitfalls.

Experiments may produce artificial results, which are difficult to apply to real-world situations. Similarly, researcher bias can impact the data collected. Results may not be able to be reproduced, meaning the results have low reliability .

Since humans are unpredictable and their behavior can be subjective, it can be hard to measure responses in an experiment. In addition, political pressure may alter the results. The subjects may not be a good representation of the population, or groups used may not be comparable.

And finally, since researchers are human too, results may be degraded due to human error.

What This Means For You

Every psychological research method has its pros and cons. The experimental method can help establish cause and effect, and it's also beneficial when research funds are limited or time is of the essence.

At the same time, it's essential to be aware of this method's pitfalls, such as how biases can affect the results or the potential for low reliability. Keeping these in mind can help you review and assess research studies more accurately, giving you a better idea of whether the results can be trusted or have limitations.

Colorado State University. Experimental and quasi-experimental research .

American Psychological Association. Experimental psychology studies human and animals .

Mayrhofer R, Kuhbandner C, Lindner C. The practice of experimental psychology: An inevitably postmodern endeavor . Front Psychol . 2021;11:612805. doi:10.3389/fpsyg.2020.612805

Mandler G. A History of Modern Experimental Psychology .

Stanford University. Wilhelm Maximilian Wundt . Stanford Encyclopedia of Philosophy.

Britannica. Gustav Fechner .

Britannica. Hermann von Helmholtz .

Meyer A, Hackert B, Weger U. Franz Brentano and the beginning of experimental psychology: implications for the study of psychological phenomena today . Psychol Res . 2018;82:245-254. doi:10.1007/s00426-016-0825-7

Britannica. Georg Elias Müller .

McCambridge J, de Bruin M, Witton J.  The effects of demand characteristics on research participant behaviours in non-laboratory settings: A systematic review .  PLoS ONE . 2012;7(6):e39116. doi:10.1371/journal.pone.0039116

Laboratory experiments . In: The Sage Encyclopedia of Communication Research Methods. Allen M, ed. SAGE Publications, Inc. doi:10.4135/9781483381411.n287

Schweizer M, Braun B, Milstone A. Research methods in healthcare epidemiology and antimicrobial stewardship — quasi-experimental designs . Infect Control Hosp Epidemiol . 2016;37(10):1135-1140. doi:10.1017/ice.2016.117

Glass A, Kang M. Dividing attention in the classroom reduces exam performance . Educ Psychol . 2019;39(3):395-408. doi:10.1080/01443410.2018.1489046

Keskin M, Ooms K, Dogru AO, De Maeyer P. Exploring the cognitive load of expert and novice map users using EEG and eye tracking . ISPRS Int J Geo-Inf . 2020;9(7):429. doi:10.3390.ijgi9070429

Ho A, Hancock J, Miner A. Psychological, relational, and emotional effects of self-disclosure after conversations with a chatbot . J Commun . 2018;68(4):712-733. doi:10.1093/joc/jqy026

Haynes IV J, Frith E, Sng E, Loprinzi P. Experimental effects of acute exercise on episodic memory function: Considerations for the timing of exercise . Psychol Rep . 2018;122(5):1744-1754. doi:10.1177/0033294118786688

Torresin S, Pernigotto G, Cappelletti F, Gasparella A. Combined effects of environmental factors on human perception and objective performance: A review of experimental laboratory works . Indoor Air . 2018;28(4):525-538. doi:10.1111/ina.12457

Schumpe BM, Belanger JJ, Moyano M, Nisa CF. The role of sensation seeking in political violence: An extension of the significance quest theory . J Personal Social Psychol . 2020;118(4):743-761. doi:10.1037/pspp0000223

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Mol Sci

Logo of ijms

Yeast Two-Hybrid, a Powerful Tool for Systems Biology

Anna brückner.

1 INSERM U884, Université Joseph Fourier, Laboratoire de Bioénergétique Fondamentale et Appliquée, 2280 Rue de la Piscine, BP 53, Grenoble Cedex 9, France

Cécile Polge

Nicolas lentze.

2 Dualsystems Biotech AG / Grabenstrasse 11a, 8952 Schlieren, Switzerland

Daniel Auerbach

Uwe schlattner.

A key property of complex biological systems is the presence of interaction networks formed by its different components, primarily proteins. These are crucial for all levels of cellular function, including architecture, metabolism and signalling, as well as the availability of cellular energy. Very stable, but also rather transient and dynamic protein-protein interactions generate new system properties at the level of multiprotein complexes, cellular compartments or the entire cell. Thus, interactomics is expected to largely contribute to emerging fields like systems biology or systems bioenergetics. The more recent technological development of high-throughput methods for interactomics research will dramatically increase our knowledge of protein interaction networks. The two most frequently used methods are yeast two-hybrid (Y2H) screening, a well established genetic in vivo approach, and affinity purification of complexes followed by mass spectrometry analysis, an emerging biochemical in vitro technique. So far, a majority of published interactions have been detected using an Y2H screen. However, with the massive application of this method, also some limitations have become apparent. This review provides an overview on available yeast two-hybrid methods, in particular focusing on more recent approaches. These allow detection of protein interactions in their native environment, as e.g. in the cytosol or bound to a membrane, by using cytosolic signalling cascades or split protein constructs. Strengths and weaknesses of these genetic methods are discussed and some guidelines for verification of detected protein-protein interactions are provided.

1. Interactomics Take Center Stage in Systems Biology

1.1. a central role for protein interactions.

The field of systems biology has achieved tremendous momentum during recent years. This development has been driven by: (i) a huge amount of genomic and proteomic data already available, (ii) the need to understand complex cellular systems or multifactorial diseases such as cancer or the metabolic syndrome, and (iii) emerging technologies which allow high-throughput screening of complex mixtures of biomolecules or non-invasive studies of live cells or entire organisms. In addition, evolution in this field would have been impossible without the parallel development of bioinformatics tools to analyze the large amounts of data generated.

Multiprotein complexes, not individual proteins, are increasingly recognized as the molecular basis of cellular fluxes of molecules, signals and energy. Thus, technologies which enable us to decipher cellular interactions between biomolecules (interactomics) together with those measuring metabolite fluxes (metabolomics, fluxomics) and signalling cascades (phosphoproteomics and others dealing with secondary protein modifications) have taken center stage in systems biology [ 1 ].

Interactomics can be applied in a global, unbiased cell systems approach, or in a more targeted approach to study a specific set of proteins [ 2 ]. While the former may identify so-called “nodes” or “hubs” in cell signalling but is often prone to errors (see discussion below on false negatives and positives), the latter is able to reliably describe sub-networks in more detail, including biophysical constants of the interaction and their spatiotemporal organization [ 3 ].

To date, the cellular interactome has mainly been explored for interactions involving proteins in the fields of cell signalling and cell architecture to understand the wiring of cellular data processing. However, it is also becoming increasingly important in many other fields.

1.2. Systems bioenergetics

Bioenergetics has known several decades of intense research, starting with the discovery of the main biochemical pathways and energy conservation in a chemiosmotic gradient in the 60s and 70s of the last century. After being a quiescent field for more than a decade, several developments during the last 15 years have put bioenergetics and mitochondria back to the forefront of scientific development [ 4 ] (for an excellent book see [ 5 ]): There has been the description of the involved protein machines at an atomic level (like the respiratory complexes in mitochondria), the discovery of a close link between mitochondria and cell signalling (calcium, apoptosis), and the emerging relationship between a dysfunction of cellular energetics and a plethora of complex pathologies, including (neuro-)muscular and age-related diseases, metabolic and cardiovascular diseases and cancer. Currently, the field of bioenergetics is about to enter the era of systems biology [ 6 ]. In fact, ATP generation needs a precise interplay between proteins of glycolysis, TCA cycle, mitochondrial electron transport and energy transfer systems like creatine kinase, which often includes specific (micro)compartmentation of proteins or multiprotein complexes maintained by specific protein-protein interactions. These topologies then allow for more precise regulation or have further thermodynamic advantages like substrate channelling between active sites. Systems bioenergetics holds the promise of integrating the multiple aspects of cellular energetics in a holistic approach which: (i) extends our knowledge on protein complexes involved in metabolic control and cell signalling [ 7 ], (ii) considers cellular compartmentation particularly important in this field [ 8 ], and (iii) aims to understand the complex regulatory cellular network which governs homeostasis in cell energetics and which apparently fails in so many pathologies [ 9 ]. Inversely, manipulating energy metabolism holds promises for therapeutic strategies. For example, it was surprising that inhibiting mitochondrial complex I in mitochondria is part of the molecular mechanism of the most successful antidiabetic drug, metformin [ 10 ]. Thus, the emerging field of systems bioenergetics does not only involve basic research, but is of prime importance for applied and clinical scientists.

For bioenergetics, interactomics goes far beyond cell signalling or cell structure, since it may uncover a new layer of regulation. The components of the mitochondrial redox chain or the ATPases are among the most complex protein assemblies, and understanding their regulation as well as the flux of protons and electrons will need intense work. Spatiotemporal organization of the long known pathways in primary metabolism is still incompletely understood [ 11 ], and the same applies to the systems of “energy-rich” intermediates like nucleoside triphosphates or phosphocreatine and mechanisms like metabolite channeling between different components in a complex [ 6 , 12 , 13 ].

1.3. Interactomics tools

This review gives an overview of several methods for global or targeted interactomics, with a particular emphasis on classical and emerging yeast two-hybrid (Y2H) systems. These Y2H tools now allow access to the almost entire cellular proteome for interaction screening, including membrane proteins, transcriptionally active proteins and proteins localized in different subcellular compartments. Massive application of such tools can be expected, since they are comparatively inexpensive as compared to others, do not need specialized large equipment and can be performed in any molecular biology laboratory with reasonable throughput.

2. Screening Technologies for Protein-Protein Interactions

Protein-protein interactions are involved in all cellular processes. Mapping of these interaction networks to elucidate the organization of the proteome into functional units is thus of prime importance for systems biology. A large number of methods have been developed for screening protein interactions. The more classical biochemical approaches, such as copurification, affinity purification or coimmunoprecipitation of protein complexes require in vitro handling of protein extracts. Further limitations of these techniques include restricted sensitivity and bias towards high affinity interactions. Once a partner has been detected, identification by mass spectrometry (MS) is generally straightforward, although rather costly. Cloning of corresponding cDNAs may be time-consuming, but clone repositories such as RIKEN or IMACE can be a convenient alternative. More recently, surface plasmon resonance (SPR), a biophysical technology, has been adopted for screening protein-protein interactions. Purified cellular extracts are injected onto a sensor chip covered with an immobilized binding partner. The instrument setup combines capture of the binding partner with a quantitative readout of the binding event, such that putative partners can be eluted and identified by MS [ 14 , 15 ]. Another approach to interaction screening are “cDNA-expression” libraries (for a review see [ 16 ]) such as phage display or Y2H methods, the latter detecting protein interactions in vivo . For studies on a genomic scale, highly parallel and automated processes are needed. However, only few detection methods for protein-protein interactions can be easily adapted for a high-throughput strategy. These include in particular yeast two-hybrid (Y2H) and affinity purification coupled to MS (AP/MS).

2.1. Yeast two-hybrid

The Y2H technique allows detection of interacting proteins in living yeast cells [ 17 ]. As described in full detail in chapter 3, interaction between two proteins, called bait and prey, activates reporter genes that enable growth on specific media or a color reaction. Y2H can be easily automated for high-throughput studies of protein interactions on a genome-wide scale, as shown for viruses like bacteriophage T7 [ 18 ], Saccharomyces cerevisiae [ 19 , 20 ], Drosophila melanogaster [ 21 ], Caenorhabditis elegans [ 22 ] and humans [ 23 , 24 ]. Experimental Y2H data have been a crucial part in establishing large synthetic human interactomes [ 25 , 26 ] or to dissect mechanisms in human disease [ 27 ]. Two screening approaches can be distinguished: the matrix (or array) and the library approach.

In the matrix approach, all possible combinations between full-length open reading frames (ORFs) are systematically examined by performing direct mating of a set of baits versus a set of preys expressed in different yeast mating types (e.g. mating type a for baits and mating type α for preys). This approach is easily automatable and has been used in yeast and human genome-scale two-hybrid screens. In yeast, 6,000 ORFs were cloned and over 5,600 interactions were identified, involving 70% of the yeast proteome [ 19 , 20 , 28 ]. The defined position of each bait in a matrix allows rapid identification of interacting preys without sequencing, but screens are usually restricted to a limited set of full length ORF’s and will thus fail to detect certain interactors (called false negatives).

The classical cDNA-library screen searches for pairwise interactions between defined proteins of interest (bait) and their interaction partners (preys) present in cDNA libraries or sub-pools of libraries. An exhaustive screen of libraries with selected baits can be an alternative to a matrix approach. Here, preys are not separated on an array but pooled (reviewed in [ 29 ]), and libraries may contain cDNA fragments in addition to full length ORFs, thus largely covering a transcriptome and reducing the rate of false negatives. However, inherent to this type of library screening, the rate of wrongly identified proteins (called false positives) is increased. In addition, interaction partners have to be identified by colony PCR analysis and sequencing, making such screens more expensive and time consuming.

2.2. Affinity purification/mass spectrometry

The value of MS for high-throughput screening of protein interactions has been recognized only more recently. This analytical technique is based on the determination of the mass-to-charge ratio of ionized molecules. Already introduced in 1948, sensitivity and implementation range of MS has been largely extended by technological advances. These include Nobel prize crowned methods for ionization like electrospray ionization [ 30 ], generating ions from macromolecules in liquid medium without their fragmentation, soft laser desorption (SLD) [ 31 ] or matrix-assisted laser desorption/ionization (MALDI) [ 32 ], using a laser beam for ionization of macromolecules without breaking chemical bonds. MS is now routinely applied to identify proteolytic fragments of proteins or even entire proteins and protein complexes [ 33 ]. Coupled to classic biochemical methods like affinity purification or chemical cross-linking, MS has become also a powerful tool for large-scale interactome research, mainly in form of affinity purification-MS (AP/MS). In this approach, a protein mostly fused to an epitope-tag is either immunoprecipitated by a specific antibody (e.g. against the tag) or purified by affinity columns recognizing the tag. Affinity purification can make use of an individual tag (e.g. a Flag-tag) for single step purification. However, it is more efficient when using two subsequent purification steps with proteins that are doubly tagged (e.g. 6xHis- and Strep-tag) or carry either C- or N-terminally a fusion of two affinity tags separated by a protease cleavage site (e.g. protein A and calmodulin binding protein) where the first tag is cleaved off after the first AP step (tandem affinity purification, TAP). This results in an enrichment of native multiprotein complexes containing the tagged protein. Subsequent MS analysis then identifies the different constituents of the complexes [ 34 ]. Ho et al. [ 35 ] expressed 10% of yeast ORFs with a C-terminal flag-tag under the control of an inducible promoter in yeast. They were able to connect 25% of the yeast proteome in a multiprotein complex interaction network. With the TAP-tag approach, Gavin et al. [ 36 , 37 ] and Krogan et al. [ 38 ] purified 1,993 and 2,357 TAP-fusion proteins covering 60% and 72% of the yeast proteome, respectively. As compared to the single Flag-tag approach, combination of two different purification steps in TAP results in improved sensitivity and specificity (TAP is reviewed in more detail in [ 39 , 40 ]). Recent technical progresses in automation of complex purification and MS analysis, together with dedicated computational methods to increase accuracy of data analysis, have made this approach a powerful tool in interactome research.

2.3. Comparison of Y2H- and MS-based methods

MS is less accessible than Y2H due to the expensive large equipment needed. Thus, a large amount of the data so far generated from protein interaction studies have come from Y2H screening. For example, more than 5,600 protein interactions have been so far reported for yeast [ 19 , 20 , 28 , 41 ] and about 6,000 for humans [ 23 , 24 ], establishing extensive protein interaction networks. Approximately half of the interaction data available on databases such as IntAct [ 42 ] and MINT [ 43 ] are coming from Y2H assays. Genome-scale Y2H screens have highlighted considerable cross-talk in the cell, even between proteins that were not thought to be functionally connected. However, Y2H and AP/MS are complementary in the kind of interactors that they are detecting. AP/MS may determine all the components of a larger complex, which not necessarily all interact directly with each other, while Y2H studies identify defined binary, interactions in this complex. In addition, some types of protein-protein interactions can be missed in Y2H due to inherent limitations, like interactions involving membrane proteins, self activating proteins, or proteins requiring post-translational modifications, but this may also occur with AP/MS-based approaches. Given the strengths of both methods, considerable effort is invested to overcome the remaining drawbacks. Different Y2H systems have been developed to extend the coverage of the proteome of interest, as will be described in detail further below. Recently, also the sensitivity and robustness of AP/MS was improved by the development of an integrated workflow coupling rapid generation of bait-expressing cell lines with an increase in protein complex purification using a novel double-affinity strategy [ 44 ]. Only a combination of different approaches that necessarily includes bioinformatics tools, will eventually lead to a fairly complete characterization of physiologically relevant protein-protein interactions in a given cell or organism. This will be a fundamental requirement to use interactome data in a systems biology approach at the cellular or higher complexity level.

3. Aiming at in Vivo Interactions: The Yeast Two-Hybrid Approach

3.1. historical perspective: the principles of the approach.

In 1989, Fields and Song revolutionized protein interaction analysis by describing a genetic system to detect direct protein-protein interactions in the yeast Saccharomyces cerevisiae [ 17 ]. Until then, interactions between two proteins had mostly been studied using biochemical techniques. The development of this completely new analytic tool was triggered by the molecular analysis of eukaryotic transcription factors. Only few years before, the Ptashne Laboratory had discovered the modular structure of Gal4, a transcriptional activator in yeast. They showed that Gal4 binds a specific DNA sequence (the upstream activation domain, UAS) and thus activates transcription in the presence of galactose. If separated into two fragments, the N-terminal fragment did still bind to DNA, but did not activate transcription in presence of galactose, while this latter function was mediated by the C-terminal fragment [ 45 ]. However, both fragments could interact and non-covalently reconstitute a fully functional Gal4. Thus, two different functional domains of Gal4 were identified: an N-terminal DNA binding domain (DBD) and a C-terminal (transcriptional) activation domain (AD), with both individual domains maintaining their function independent of the presence of the other.

Inspired by these findings, Fields and Song exploited the modular properties of the transcription factor Gal4 to monitor protein-protein interactions. The basic idea was to fuse the two proteins of interest X and Y to DBD and AD of Gal4, respectively, such that interaction between X and Y reconstitutes a functional transcription factor that could then drive reporter gene expression ( Figure 1 ). In the first construct called bait, protein X (e.g. the glucose-sensor SNF1) was fused to the N-terminal part of GAL4 containing the DBD (GAL4DBD). In the second construct, the prey, protein Y (e.g. the regulatory protein SNF4) was fused to the C-terminal part of Gal4 that contains the AD (GAL4AD). Expression of both fusion proteins in yeast and interaction between bait and prey indeed reconstituted a functional Gal4 transcription factor from the two separate polypeptides. Gal4 then recruited RNA polymerase II, leading to transcription of a GAL1-lacZ fusion gene. This reporter gene encodes the enzyme beta-galactosidase which labels the yeast cell when using a colorimetric substrate [ 17 ].

An external file that holds a picture, illustration, etc.
Object name is ijms-10-02763f1.jpg

The classical yeast two-hybrid system. (A) The protein of interest X, is fused to the DNA binding domain (DBD), a construct called bait. The potential interacting protein Y is fused to the activation domain (AD) and is called prey. (B) The bait, i.e. the DBD-X fusion protein, binds the upstream activator sequence (UAS) of the promoter. The interaction of bait with prey, i.e. the AD-Y fusion protein, recruits the AD and thus reconstitutes a functional transcription factor, leading to further recruitment of RNA polymerase II and subsequent transcription of a reporter gene.

For a genome-wide screen for interactors of given baits, a cDNA library is used to construct an entire library of preys. From a methodological point of view, any such Y2H screen implies the transformation of yeast cells with bait and prey cDNA on different vectors under the control of yeast promoters. Expression levels will depend on the promoter used and may affect sensitivity and specificity of the screen. Once expressed in the cytosol, bait and prey must be able to enter the nucleus to activate transcription, a limitation of the original Y2H approach further discussed below.

This classical Y2H system has been extended to exploit different other DNA-binding proteins (e.g. DBD of E. coli repressor protein LexA), transcriptional activators (e.g. AD of Herpes simplex virus VP16) and various reporter genes. A suitable reporter gene must encode a protein whose function provides a simple readout. Thus, besides the colorimetric reaction with the lacZ gene, the most commonly used are auxotrophic markers (e.g. LEU2, HIS3, ADE2, URA3, LYS2 ) that allow growth on minimal media. In the current state-of-the-art, more than one reporter gene is assayed in parallel to increase the stringency of Y2H screens [ 46 ]. In fact, one of the common problems of Y2H is the generation of false positives due to non-specific interactions (as described in detail further below). Selection for two active reporter genes requires a more solid transcriptional activation and thus increases the stringency of the assay, but concomitantly penalizes detection of weak and transient interactions. Another possibility to adjust the stringency of the assay is partial inhibition of the enzymatic activity encoded by the reporter gene. For example, the product of the HIS3 reporter, imidazole glycerol phosphate dehydratase, is competitively inhibited by increasing concentrations of 3-aminotriazole.

Compared to earlier interaction screens, the Y2H system was able to detect interactions in vivo in a true cellular environment. Since it is also relatively easy to implement and inexpensive, Y2H rapidly became the system of choice for detecting protein-protein interactions. Its principles were rapidly adopted for screenings involving interaction of more than two partners. To analyse ligand-receptor interactions, a synthetic heterodimer of two different small organic ligands is used as a third hybrid molecule together with two receptors fused to DBD and AD. In this case, binding of the hybrid organic ligand to both receptors will force them together to reconstitute the DBD-AD complex [ 47 ]. This three hybrid system can also be used to identify inhibitors of protein-protein interactions [ 48 ]. Another extension of the classical Y2H system is the use of more than one bait, in particular to compare interaction specificities [ 49 ]. In the so-called Dual Bait system, protein X 1 is fused to the LexA DBD, and protein X 2 is fused to the DBD of the cI repressor from bacteriophage λ. Thus, each bait is directed to a different reporter gene. Positive interactions with X 1 are registered through lexA operator activation of LEU2 and LacZ , and positive interactions with X 2 through cI operator activation of LYS2 and GusA. GusA codes for beta-glucuronidase that can use a colorimetric substrate to report interactions. This system has been successfully used to identify proteins interacting with specific regions in larger proteins [ 50 ]. Further more recent expansions of Y2H to high-throughput applications, the so-called matrix or array approach, has been already discussed in the previous chapter.

In their original publication Fields and Song already mentioned some of the limits of their Y2H method: “The system requires that the interaction can occur within the yeast nucleus, that the Gal4-activating region is accessible to the transcription machinery and that the Gal4(1-147)-protein X hybrid is itself not a potent activator”. These limitations would exclude almost half of all proteins, explaining the great interest for developing alternative Y2H variants.

3.2. Choosing the right strategy: Available Y2H systems and their advantages

More recent Y2H-based techniques access almost the entire cellular proteome (see Table 1 ). Almost all of them rely on a similar principle, namely the modular structure of the protein reporting the interaction. Similar to DBD and AD reconstituting a transcription factor in the original Y2H system, they employ proteins containing two structural domains which can fold correctly independently of each other and which reconstitute the functional reporter system if brought together via bait-prey interaction. An exception of this principle is the recruitment-based Y2H, where the reporter cascade is activated by forced membrane localization of the bait-prey complex. The following chapter will present in more detail the currently available Y2H systems ( Table 1 , Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is ijms-10-02763f2.jpg

Yeast two-hybrid systems, their subcellular location within a yeast cell, and their operating mode (represented at the moment of bait-prey interaction).

Protein X (dark blue puzzle piece, part of bait construct) and protein Y (light blue puzzle piece, part of prey construct) directly interact (fitting puzzle pieces), thus inducing reconstitution of split-proteins (puzzle pieces of different colors in A, D, E), membrane recruitment (B, C) or protein dimerization (F). Protein fusions in bait or prey constructs are shown as solid black lines between puzzle pieces. Bait-prey interaction activates further downstream events (arrows) that directly (A) or indirectly (B, C, D, F) lead to transcriptional activation, or are independent of transcriptional activation (D, E), finally yielding screenable readouts like growth on specific media or color reactions. (A) Nuclear Y2H systems all require protein recruitment and bait-prey interaction at nuclear DNA. The classic Y2H and RTA Y2H both engage RNA polymerase II (RNA Pol II) transcription either by its activation or its inhibition. By contrast, the Pol III Y2H , involves RNA polymerase III (RNA Pol III) transcription. (B) Ras signalling based Y2H at the plasma membrane. The SRS Y2H , RRS Y2H , and rRRS Y2H are all based on protein recruitment to the plasma membrane via bait-prey interaction and subsequent activation of MAPK downstream signalling. While in the SRS and RRS Y2H the prey constructs harboring protein Y are anchored at the membrane via myristoylation to analyze interactions with cytosolic bait constructs harboring protein X, the rRRS is used to analyze interactions between soluble preys containing protein Y and partner X being a membrane protein. (C) G-protein signalling-based Y2H at the plasma membrane. In the G-protein fusion Y2H , bait X is a membrane or membrane-associated protein whose interaction with the prey construct disrupts protein G downstream signalling. (D) Split-ubiquitin based Y2H systems involve reconstitution of ubiquitin from two domains upon bait-prey interaction. Their subcellular localization depends on the nature of interacting proteins X or Y, and on the reporter proteins used. The Split ubiquitin Y2H uses non-transcriptional reporting of protein interactions in the cytosol, but can also be used for membrane proteins (not shown). The MbY2H is used for interaction analysis with membrane baits and thus occurs at the membrane location of protein X, e.g. the plasma membrane. The CytoY2H is used for membrane anchored cytosolic baits and occurs close to the ER membrane (E) Split-protein sensor Y2H. The Split-Trp Y2H is used to assay cytosolic bait-prey interactions based on reconstitution of an enzyme in tryptophan synthesis, allowing for non-transcriptional reporting. (F) ER Y2H system. The SCINEX-P Y2H allows bait-prey interaction analysis in the reducing environment of the ER, based on protein dimerization in unfolded protein signalling. ER, endoplasmic reticulum; for further abbreviations and details see chapter 3.2.

Overview of different Y2H systems and their specificities.

1989Classic Y2H system [ ]Non-transactivating proteins capable of entering nucleusTranscriptional activationNucleusYes [ ]
1994SOS recruitment system (SRS) [ ]Transactivating, cytosolic proteinsRas signallingMembrane peripheryYes [ ]
1994Split-ubiquitin system [ ]Nuclear, membrane and cytosolic proteinsUracil auxotrophy and 5-FoA resistanceCytosolYes [ ]
1998Membrane split-ubiquitin system (MbY2H) [ ]Membrane proteinsTranscriptional activationMembrane peripheryYes [ ]
1998Ras recruitment system (RRS) [ ]Transactivating, cytosolic proteinsRas signallingMembrane peripheryYes [ ]
1999Dual bait system [ ]Two non-transactivating proteins capable of entering nucleusTranscriptional activationNucleusYes [ ]
2000G-protein fusion system [ ]Membrane proteinsInhibition of protein G signallingMembrane peripheryNo
2001RNA polymerase III based two-hybrid (Pol III) [ ]Transactivating proteins (in the RNA polymerase II pathway)Transcriptional activationNucleusYes [ ]
2001Repressed transactivator system (RTA) [ ]Transactivating proteins capable of entering nucleusInhibition of transcriptional activationNucleusYes [ – ]
2001Reverse Ras recruitment system (rRRS) [ ]Membrane proteinsRas signallingMembrane peripheryYes [ ]
2003SCINEX-P system [ ]Extracellular and transmembrane proteinsDownstream signalling & transcriptional activationEndoplasmic reticulum (ER)No
2004Split-Trp system [ ]Cytosolic, membrane proteinsTrp1p activityCytosolYes (Lentze & Auerbach, unpubl.)
2007Cytosolic split-ubiquitin system (cytoY2H) [ ]Transactivating, cytosolic proteinsTranscriptional activationER membrane peripheryYes [ ]

3.2.1. Y2H with transactivating proteins in the nucleus

The classic Y2H system is based on reconstitution of a transcription factor and thus not adapted for interaction analysis with proteins that can directly activate transcription. Such transactive baits would trigger transcription in absence of any interaction with a prey. Two alternative Y2H systems have been developed to analyze the interaction network of such proteins. One is based on repression of transactivation, while the other uses the alternative polymerase III transcription pathway. Also methods mentioned under 3.2.2 (e.g. the split-ubiquitin systems) are suitable to screen transactive baits.

In the repressed transactivator (RTA) system ( Figure 2A ), inversely to the classic Y2H, the bait-prey interaction represses transcriptional activation of reporter genes [ 60 ]. The protein of interest X fused to the DBD of Gal4 is transactive, e.g. a transcription factor. If it interacts with another protein Y fused to the repression domain (RD) of a transcription repressor (e.g. Tup1p), the transcription of the reporter gene is repressed [ 60 ]. The RTA system has been used to demonstrate interactions between the mammalian basic helix-loop-helix proteins MyoD and E12, and between the protooncogenic transcription factor c-Myc and the putative tumor suppressor protein Bin1 [ 60 ]. It has also been applied to screen for novel interactions with a variety of transcriptional activators, including herpes simplex virus 1 (HSV-1) regulatory protein VP16 [ 60 ], c-Myc [ 62 ], and the androgen receptor [ 61 ].

Recently, this system has been extended to screen for molecules which inhibit protein-protein interaction, for example between the immunophilin FKBP12 and the transforming growth factor β receptor (TGFβ-R) C terminus [ 67 ]. FKBP12 itself is not transactivating, but was fused to VP16-AD in addition to Gal4-DBD. In the absence of interaction with a RD-fusion protein, e.g. due to the presence of an inhibitor, transcription of reporter gene HIS3 is activated. Strength of the inhibition is translated into expression levels of HIS3 which can be probed by increasing amounts of 3-aminotriazole, a competitive inhibitor of the HIS3 gene product. Compared to the classic Y2H, this assay has the advantage that inhibition of interaction does not result in a loss but in a gain of reporter gene transcription and thus in a positive signal facilitating screening procedures. Thus, the RTA Y2H can not only be used to identify interaction partners of transcription factors, but also as a reversed Y2H to screen small molecule libraries e.g. for potentially novel therapeutic compounds acting as inhibitors of a given protein-protein interaction.

The RNA polymerase III based two-hybrid (Pol III) system ( Figure 2A ) is another alternative to screen for interaction partners of transcription factors activating RNA polymerase II-based transcription. As in the classic Y2H, a protein X is fused to a Gal4-DBD (bait), and this bait is able to bind DNA due to Gal4p binding sequence artificially introduced into the reporter gene SNR6 . However, the prey construct is different, since the second protein Y is fused to τ138p. This protein is a subunit of the multimeric protein complex TFIIIC, one of the two transcription factors involved in RNA polymerase III (PolIII)-mediated transcription. If now the bait interacts with the prey containing τ138p, the TFIIIC complex is bound to DNA and recruits a second transcription factor (TFIIIB) and Pol III. This will activate transcription of the SNR6 reporter gene to produce U6 snRNA [ 68 ]. In a yeast strain harboring a temperature-sensitive U6 snRNA mutant [ 59 ], this reporter gene transcription will rescue the temperature-sensitive phenotype and allow yeast growth at 37°C. The system has been used to screen a mouse embryonic cDNA library using τ138p-mBRCA1 as a bait [ 59 ], but apparently has not been further adopted for screening assays.

3.2.2. Y2H with cytosolic and membrane proteins

The classic Y2H and the two alternative systems presented above require the translocation of the interacting proteins into the nucleus and are thus not suitable for membrane associated proteins, integral membrane proteins and many other soluble cytosolic proteins or proteins localized in other subcellular compartments. To circumvent these limitations, truncated versions of these proteins have been used for Y2H screens [ 69 – 71 ]. However, the use of such truncated proteins can lead to misfolding, and the problem remains that the nucleus is not the natural environment for most of these proteins. Such problems, probably leading to a high rate of false negatives in the past, would be circumvented by screening procedures where interacting proteins remain in their natural cellular compartment. Outside the nucleus, away from the transcription machinery, also the use of transactivating baits would no longer constitute a problem.

The SOS- and the RAS recruitment systems (SRS and RRS) ( Figure 2B ) are bypassing the transcriptional readout by using the Ras signalling pathway, which is homologous between yeast and mammals. Ras has to be localized at the plasma membrane to undergo GDP-GTP exchange by guanyl exchange factors, Cdc25 in yeast or Son of sevenless (SOS) in mammals. This activated Ras then triggers downstream signalling. For the Y2H systems described here, a Cdc25-2 temperature sensitive yeast strain is used which is unable to grow at a higher temperature (36°C) because Cdc25-2 becomes inactive and fails to activate Ras signalling. The temperature-sensitive phenotype can then be rescued by alternative activation of Ras in the Y2H setup.

In the SOS recruitment system ( Figure 2B : SRS Y2H), a soluble protein X is fused to mammalian SOS. If the SOS-X fusion interacts with a prey localized in the membrane (e.g. via myristoylation), SOS stimulates guanyl exchange on yeast Ras (yRas) and promotes downstream signalling [ 51 ].

In the Ras recruitment system ( Figure 2B : RRS Y2H), the soluble protein X is directly fused to constitutively active mammalian Ras (mRas). Already active, this Ras only requires membrane location, bypassing the activity of Ras guanyl exchange factors (Cdc25 or SOS). The mRas-X fusion is recruited to the membrane by interaction with a membrane associated prey [ 57 ].

Both SRS and RRS allow the analysis of interactions between soluble baits and soluble or membrane preys. Specifically for the use of membrane localized baits, the reverse Ras recruitment system ( Figure 2B : rRRS Y2H) has been developed. Conversely to the RRS, the prey is the Ras fusion protein, and the bait is membrane-anchored or itself a membrane protein [ 63 ]. Although the rRRs has been used for screening procedures [ 63 ], it has an important disadvantage. Preys containing membrane proteins are self-activating, since they localize Ras to the membrane even without bait-prey interaction. These false positive membrane proteins have to be eliminated by additional selection steps, rendering the method more laborious. Exclusion of membrane and membrane-associated proteins also represents serious limitation as compared to other more recent Y2H techniques.

The G-protein fusion system ( Figure 2C ) allows, similar to the rRRS, to study the interaction between integral membrane bait and a soluble prey. The latter is a fusion protein with the γ-subunit of a heterotrimeric G-protein. If the prey interacts with the membrane-located bait, it will sequester G-protein β-subunits, thus disrupting formation of heterotrimeric G-protein complex and subsequent downstream signalling [ 58 ]. The method has been used to identify neuronal Sec1 mutants unable to bind syntaxin1, a member of the SNARE complex [ 58 ]. Similar as with the RTA Y2H system (see above, Figure 2A ), the authors suggest that G-protein Y2H may identify drugs disrupting protein-protein interactions. Both systems report disrupted interaction by a gain of signal which is easier to detect in a library screen as compared to a loss of signal.

The Split-ubiquitin system ( Figure 2D ) was designed by Johnsson and Varshavsky in 1994 [ 53 ] to allow detection of protein-protein interactions occurring between cytosolic proteins; it was later extended to membrane proteins. Ubiquitin is a small protein important for the turnover of cellular proteins. Proteins are labelled for proteasomal degradation by covalently attaching a poly-ubiquitin chain. This chain is then cleaved off prior to protein degradation by ubiquitin specific proteases (USP). The split ubiquitin Y2H technique is based on separation of ubiquitin into two independent fragments. It has been shown that ubiquitin can be split into an N-terminal (Nub) and a C-terminal half (Cub) and that these two parts retain a basic affinity for each other, thus allowing spontaneous reassembly of quasi-native ubiquitin. This spontaneous reassociation of Nub and Cub is abolished by point mutations (I13G or I13A) in Nub (NubG, NubA) [ 53 ]. In these mutants, efficient association is only observed if the two moieties are brought into close proximity by interaction of two proteins fused to NubG/A and Cub respectively. Reconstituted split-ubiquitin is recognized by USPs, which then cleave off any reporter protein fused to the C-terminal end of Cub. The original system used dihydrofolate reductase as reporter protein, whose release was detected by SDS-PAGE [ 53 ]. However, this readout was not convenient, since it needed immunoprecipitation and electrophoretic separation.

Looking for a more direct readout, Ura3p protein has been used as reporter ( Figure 2D : Split ubiquitin Y2H) [ 72 ]. Ura3p is an orotidine 5-phosphate decarboxylase (ODCase), an enzyme involved in the synthesis of pyrimidine ribonucleotides. ODCase activity leads to uracil auxotrophy and sensitivity to 5-fluoroorotic acid (5-FOA), because the latter is converted into the toxic compound 5-fluorouracil, causing cell death. As Y2H reporter, a variant of Ura3p is used, rUra3p, which is N-terminally modified for rapid degradation according to the N-end rule [ 73 ]. Interaction between bait and prey leads to ubiquitin reconstitution and subsequent cleavage of rUra3p, resulting in rapid degradation of rUra3p, inability to grow on minimal medium without uracil, and resistance to 5-FOA. This system is not based on a transcriptional readout and can therefore be applied to nuclear, cytoplasmic and membrane proteins [ 74 – 76 ].

In the membrane transactivator split-ubiqitin (MbY2H) system, an artificial transcription factor (LexA-VP16) has been used as a cleavable reporter protein to analyse interactions between membrane proteins of the endoplasmic reticulum (ER) ( Figure 2D : MbY2H) [ 55 ]. Once ubiquitin is reassembled, LexA-VP16 is released to the nucleus, where it activates reporter gene transcription (i.e. HIS3 , LacZ ). Such a transcriptional readout leads to an amplification of the response following protein interactions and offers more sensitivity, more convenient for transient interactions. This system was successfully used to detect interactions involving different kinds of membrane proteins [ 56 ]. Split-ubiquitin based systems have become quite popular and have been successfully applied for cDNA library screens [ 77 – 81 ] and large scale matrix approaches [ 82 ].

Recently, an adaptation of the MbY2H system to screen cytosolic proteins has been published ( Figure 2D : CytoY2H) [ 66 ]. Here, the bait construct contains both Cub and the transcription factor and is anchored to the ER membrane thanks to a fusion to the ER membrane protein Ost4p. This allows screening for interaction partners of a soluble protein among membrane and/or soluble proteins, as well as for proteins that are transcriptional activators or otherwise self-activating in nuclear Y2H.

Other Split-protein sensors ( Figure 2E ) have been developed, inspired by the split-ubiquitin system. While the cytosolic Y2H methods presented above are based on indirect readout that requires activation of signalling pathways or transcription, split-protein sensors can in principle also directly report their reconstitution. In 2004, Tafelmeyer et al. presented a combinatorial approach to generate split-protein sensors [ 65 ]. They used an enzyme in yeast tryptophan biosynthesis, N-(5-phosphoribosyl)-anthranilate isomerase (Trp1p), to perform activity selections of different combinations of fragment pairs. They identified C-terminal (CTrp) and N-terminal (NTrp) fragments which reconstitute a quasi-native Trp1p only when fused to two interacting proteins that bring the CTrp and NTrp domains into close proximity. Thus, interacting fragments lead to Trp1p reconstitution and allow trp1 deficient yeast strains to grow on medium lacking tryptophan ( Figure 2E : SplitTrpY2H). This system has several advantages. The readout is direct and permutation-independent, i.e. independent of whether CTrp or NTrp were used for bait constructs. It is universally applicable to all types of proteins, because the interaction readout is entirely independent of cellular localization.

Recently, split enhanced green fluorescent protein has been used to monitor protein-protein interactions in yeast by confocal microscopy [ 53 ]. A variety of other split-protein sensors has been applied in eukaryotic cells (e.g. dihydrofolate reductase [ 54 ], β-galactosidase [ 55 ], β-lactamase [ 56 ]), but has not yet been used in Y2H screening.

3.2.3. Yeast two-hybrid with extracellular and transmembrane proteins

All Y2H systems presented so far detect interactions in the reducing intracellular environment, which is not necessarily ideal for extracellular proteins. However, interactions in the extracellular space, like between receptors and ligands or between antibodies and antigens, participate in a multitude of physiological processes, and their study is of particular interest for a better understanding of numerous pathologies.

The SCINEX-P (screening for interactions between extracellular proteins) system ( Figure 2F ) published by Urech et al. in 2003 allows the analysis of protein-protein interactions in the oxidizing environment of the ER [ 64 ]. This system exploits the signalling of the yeast unfolded protein response (UPR). Accumulation of incorrectly folded proteins in the ER induces dimerization of the yeast ER type I transmembrane protein (Ire1p), which induces production of transcriptional activator Hac1p that will activate transcription of chaperons. In the SCINEX-P system, proteins of interest are fused to mutated Ire1p proteins, one lacking its luminal, N-terminal oligomerization domain (ΔIre1p). The interaction between two hybrid proteins then reconstitutes Ire1p dimerization and thus activates UPR downstream signalling. To monitor protein interactions, the Hac1p UPR element is introduced into the promoter of reporter genes. This Y2H system was successfully used to analyze the interaction between the protein disulfide isomerase ERp57 and Calnexin, both involved in protein folding in the ER [ 83 ], as well as known interactions between antigens and antibodies [ 64 ].

3.3. Dealing with doubt: Limitations of Y2H systems and methods for its validation

Its relative methodical simplicity, its diversity, and its high-throughput capacity make the Y2H system the most popular analytic and screening method for interactomics. Nevertheless, all Y2H methods face the problem of false negatives and false positives.

False negatives in Y2H are protein-protein interactions which cannot be detected due to limitations of the screening method. In the classic Y2H, for example, protein interactions involving membrane proteins are mostly undetectable. Thus, the Y2H strategy has to be chosen carefully, depending on the cellular sub-proteome of interest. Further, the interaction between the two proteins assayed in Y2H is often not symmetric (permutation-independent), meaning it depends on whether a given protein is used for fusion in the bait or the prey construct. The fused yeast reporter proteins or anchors may cause steric hindrance that impedes interaction, thus causing false negatives. Another reason for false negatives can be different or lacking post-translational protein modifications in the yeast system when analyzing interactions between proteins of higher eukaryotes. In this case, the modifying enzyme may be coexpressed in yeast together with bait and prey. This possibility has been used with success to identify tyrosine-phosphorylation dependent interactions [ 84 ]. Very transient interactions may also escape detection, as e.g. in case of substrate interactions of protein tyrosine phosphatase. Here, substrate-trap mutants have been used lacking phosphatase activity but retaining their affinity for the substrates to identify protein substrates of the phosphatase [ 85 ]. The expression of baits fused to their cognate modifying enzyme has been successfully used to identify acetylation dependent interactions with histones and interactions dependent on phosphorylation of the carboxy-terminal domain of RNA polymerase II [ 86 ]. The lack of more complex modifications, like complex glycosylation, appears to be more difficult to overcome. A humanized yeast strain has already been used to produce human glycosylated proteins in yeast Pichia pastoris [ 87 ], but it has so far not been used in Y2H.

False negatives mainly cause problems in reproducibility of Y2H screens. Two independent large-scale Y2H screens using the same Y2H method showed less than 30% overlap in the identified interactions and only 12,5% of known interactions were found in each of both [ 19 ]. These discrepancies may arise from a difference in selection stringency or a difference in the cDNA library used. Thus, false negatives represent a real limitation of the Y2H system in representing an entire protein interaction network. However, each screening system has to deal with false negatives. For example, MS of purified protein complexes reveals only few interactions involving transmembrane proteins due to their difficult purification [ 88 ]. AP/MS was also shown to be biased towards highly abundant proteins, whereas protein abundance appears not to influence Y2H [ 88 ]. While purification of protein complexes has to deal with mixtures of proteins showing very different abundance, depending on the used cell type, such differences are avoided in Y2H by overexpression of interacting proteins at similar levels. However, protein overexpression can provoke other artefacts such as false positives.

False positives in Y2H are physical interactions detected in the screening in yeast which are not reproducible in an independent system. They are of diverse origin and often depend on the Y2H system used. Among possible reasons for false positive interactions in yeast may be a high expression level of bait and prey and their localization in a compartment which does not correspond to their natural cellular environment. Another source of false positives is interaction of prey with the reporter proteins (e.g. LexA in the classic Y2H) or the membrane anchors (e.g. Ost4p in the cytoY2H) fused to the bait. Proteins which allow yeast to overcome nutritional selection when overexpressed are also often scored as false positives. Finally, proteins that are known to be “sticky” or that are not correctly folded can show unspecific interactions. In general, for each Y2H system, a list of recurrent false positives can be established. A list created by Golemis and co-workers for the classic Y2H can be found at http://www.fccc.edu/research/labs/golemis/InteractionTrapInWork.html .

Despite these limitations, the Y2H system remains a powerful tool for large-scale screening in interactomics. The comparative assessment of high-throughput screening methods by von Mehring et al. [ 88 ] revealed that Y2H has a lower coverage of the protein interaction network than the purification of protein complexes coupled to MS. But these authors only considered the classic Y2H, while the above presented diversity of Y2H systems may increase coverage considerably.

To evaluate the quality of a generated interaction data set, coverage and accuracy need to be considered together. In fact, a large interaction network cannot be a solid base for systems biology if confidence in the data is low. In a quantitative comparison of interaction data sets, von Mehring estimated the accuracy of a classic high-throughput Y2H screen to be less than 10%. Thus, the question remains how to increase accuracy of Y2H interaction data sets.

As mentioned before, there are two different screening approaches: the targeted library screening approach and the global matrix screening approach. To increase accuracy of a library screen, a bait-dependency test can be performed [ 66 , 94 ]. In this case, the previously identified preys are tested for interaction with unrelated baits. Preys interacting with others than the screening bait will be classified as false positives. This test helps to eliminate false positives resulting from non-specific interactions with the bait or other “sticky” interactions overcoming nutritional selection, but it cannot eliminate physical interactions, artificially occurring in the Y2H system without physiological meaning. For this reason, binary interactions detected in Y2H are nowadays published only if they are validated by other methods [ 80 , 89 – 91 , 93 ]. Different validation methods that can be used are listed in Table 2 .

Overview of different validation methods.

Pull-down assay [ – ] Tagged bait (mostly expressed in ) is immobilized on a resin and subsequently “pulls down” target protein (prey) from lysates (of eukaryotic cells or of expressing proteins of interest). After washing steps, prey is detected by SDS-PAGE/immunoblot or MS.
Coimmunoprecipitation [ , , , ] A specific antibody is used to precipitate the bait from cell lysates (see above). After washing steps, coimmunoprecipitated prey is detected as above.
Surface plasmon resonance (Biacore) [ ] Bait immobilized on the surface of a sensor chip is probed by injection of prey onto the surface. Protein interaction is detected online via a biophysical principle (using the change in refractive index at the sensor surface in case of protein interaction). Protein is eluted and analyzed by MS.
hybridization [ ] Hybridization of a labelled complementary DNA or RNA strand (i.e. probe) to a specific DNA or RNA sequence in a tissue section. Visualizes expression of specific genes to evaluate potential coexpression of proteins of interest in the same cell of a given tissue.
Immunohistochemistry, immunocytochemistry [ , , ] Proteins in fixed cells or tissue sections are detected by immune-labelling with fluorescently tagged antibodies, e.g. using confocal microscopy. Visualizes coexpression of proteins of interest in the same cell and potential subcellular colocalization.
Fluorescent detection in live cells [ ] Proteins in living cells are detected with fluorescently tagged antibodies as above (using permeabilized cells) or after expression of fluorescently tagged protein variants. Visualizes colocalization of proteins of interest.
Fluorescence resonance energy transfer (FRET) [ ] Bait and prey are fused to two different fluorescent tags with overlapping emission/excitation spectra. If both proteins are in close proximity, excitation of the first fluorophore (donor) leads to energy transfer to the second fluorophore (acceptor). Acceptor fluorescence can be observed (fluorimeter) or in living cells (confocal microscopy).
Bioluminescencer resonance energy transfer (BRET) [ ] Similar to FRET (see above), but with bait fused to bioluminescent luciferase, thus avoiding the external excitation step susceptible to generate background. Detection as with FRET.

It is advisable to use more than one method to validate an identified protein-protein interaction, preferentially coupling biochemical methods (pull down assay, immunoprecipitation, Biacore surface plasmon resonance) with in vivo/in situ methods (colocalization, immunohistochemistry, in situ hybridization). The former methods allow the study of physical protein interactions, but pull-down assays require a certain stability of the protein complex or, in case of Biacore, even need purified interaction partners. It may be also difficult to validate transient protein interactions or protein interactions with transmembrane proteins in these assays. The in vivo/in situ methods allow insight into possible coexpression and colocalization of the two proteins involved, but generally do not provide conclusive evidence for direct interaction. However, an advantage of in situ hybridization would be its adaptability for high-throughput. The FRET method has been developed to go beyond protein colocalization in vivo to study the spatio-temporal occurrence of the interaction and its physiological significance. FRET can only occur when the distance separating the two different fluorophores is in the low nm-range, a condition that occurs if fluorophores are coupled to two directly interacting proteins [ 95 ]. However, many of these methods are relatively labor intensive and can only be applied to a small number of interactions detected in a larger screen.

Validation of results from high-throughput matrix studies is much more difficult to achieve. Using the mentioned validation methods would be experimentally extremely difficult. Given that both interaction partners are randomly selected, the large amount of generated interaction data would already render a bait dependency test impossible. To handle the problem of false positives in such large-scale approaches, help is coming from computational biology. Confidence scores can evaluate the biological significance and probability of a given interaction. One possibility is to relate screening results to known data like RNA expression levels (expression profile reliability (EPR) index), or interaction networks of protein paralogues (paraloguous verification method (PVM)) [ 96 ]. Another score was calculated by combining data on sequence homology, known interacting Pfam domains and Gene Ontology annotations [ 97 ]. Even if these methods allow creation of higher confidence scores, they are limited by the number of existing data from other screens and experiments. Another possibility is thus the creation of a statistical model only based on screen data and topological criteria [ 98 ]. These scores will not replace experimental validation of detected interactions, but may provide a tool to select proteins for further experiments.

4. Further Confirmation: Protein-Protein Interactions within a Biological System

Once a protein-protein interaction has been identified and validated, the physiological function of a given interaction remains to be established in a biological system. The main questions in this respect are: (i) Where and when in the system the interaction does occur? (ii) Which parameters influence the interaction? (iii) What is the effect of the interaction? To answer these questions, the main strategy relies on varying different system parameters that mainly affects the proteins of interest. Combination of a panel of complementary methods is generally able to unveil the physiological significance of an interaction identified in a targeted approach.

Colocalization experiments in cell culture under different conditions can give information about the spatiotemporal dynamics of the protein-protein interactions. For example, choosing different time points during the cell-cycle may reveal transient colocalizations. In the case of the reported interaction between brain type creatine kinase (BCK) and the cis-Golgi matrix protein (GM130), a transient colocalization during early prophase was observed [ 91 ]. The authors suggest that BCK would facilitate GM130 phosphorylation by ATP-requiring protein kinases and thus play a role in initial fragmentation of the Golgi apparatus prior to cell division. Many other endogenous or external parameters influencing protein-protein interaction can be varied, including activation of signalling cascades or changes in the cellular environment. To analyse the impact of given protein-protein interactions on the cellular phenotype, the interaction may be either disturbed, e.g. by RNA silencing of one interaction partner, or favoured by addition or overexpression of one protein partner. More specifically, the interaction domains of both interaction partners can be mapped to inhibit the interaction in vivo by expressing interaction-deficient mutant proteins or using inhibitory peptides.

These experiments can be carried out for defined interactions of a small number of proteins, but again it would be quite difficult to transfer them to the large interaction network generated by global screens. So far, interactome approaches concentrate on a characterization of the nodes in the interaction network, which may be the major determinants of a phenotype.

5. Conclusions

Since systems biology aims at a complete representation of cellular complexity, thus avoiding any reductionism, the applied experimental strategies have to provide non-biased, complete data sets. In this context, the yeast two-hybrid technologies presented here are a starting point rather than a complete solution to the elucidation of interaction networks. However, Y2H has demonstrated its power by its methodological diversity and technical simplicity to rapidly generate a large amount of reliable protein-protein interaction data. More recent Y2H technologies, in particular those based on split proteins, allow to probe protein-protein interactions in their native cellular compartment and to access almost the entire cellular proteome. Y2H is rather complementary in respect to emerging AP/MS techniques, since it identifies direct interactions and also detects interaction of lower affinity that are rather transient.

Developing high throughput approaches at the cellular level and further progress in bioinformatics will be necessary to make interactomics a fully integral part of a systems biology approach. Major efforts will be necessary for the challenge of modelling the large and dynamic interaction network of a cell. Only a combination of different approaches (e.g. Y2H, MS, bioinformatics) will eventually lead to an accurate description of large interaction networks.

Acknowledgments

Work from the authors was supported by EU FP6 contract LSHM-CT-2004-005272 (EXGENESIS, to U.S.) and LSH-2005-1.2.5-3-037365 (TARGET SCREEN, to D.A.), as well as by the French Agence Nationale de Recherche (“chaire d’excellence”, given to U.S.), the Institut National de la Santé et de la Recherche Médicale (INSERM), and a graduate training fellowship of the French Ministry of Research (given to A.B.).

References and Notes

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

energies-logo

Article Menu

experimental method hybrid

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A hybrid experimental-numerical method to support the design of multistage pumps.

experimental method hybrid

1. Introduction

Ref. and YearStage CountsCFD
Domain
Grid
Features
ModelsOverall Accuracy
Stages Vanes *Type Cells [×10 ] All/for VaneCell Size [mm]y+/Near-Wall Prism LayersTurbulenceMotion Head
Mean/Max
Efficiency Mean/Max
[ ] 2002311S-/0.28-n.d./-κ-εMFR−4%/−25%
12%/40%
n.d.
n.d.
[ ] 20041011S-/≈0.60-<10/-κ-εTADS<1%/- -/<+5%
[ ] 200641 allS/U 2.9/≈0.4-10–250/n.d.κ-εMFR+6% ±3%
[ ] 200966allU8/≈0.15-n.d./n.d.κ-εMFR−5% −2%
[ ] 201262allS2.5/<0.072.930–100/-κ-εMFR+3% +1%
[ ] 201352allU1.1/<0.07 2.930–50/0κ-εMFR+7% +8%
[ ] 201344allU1/≈0.036.6n.d./n.d.κ-εMFR+5% +2%
[ ] 201344allU2.1/≈0.053.3n.d./-κ-ε RNGMFRn.d.n.d.
[ ] 201566allU27/≈0.75-n.d./n.d.κ-ω SSTMFR+5%/+11%+8%/+13%
[ ] 201622allU 3/≈0.251.90100/4κ-εMFR±3%/-±7%/-
[ ] 201644 allS8/0.191.8230–60/-κ-εMFR-/±1%-/±3%
[ ] 201661 1S -/0.531.8825–70/-κ-εMFR2%/<+5%<4%/8%
[ ] 2016771S2/0.281.08<30/-κ-ω SSTMFR-/<+15%6%/8%
[ ] 201752 allS3.2/≈0.181.91n.d./-κ-εMFR±3%/-6%/20%
[ ] 20191.51.5allS2/≈0.2 -n.d./-κ-εMFR5%/- n.d.
[ ] 202044 allU 3.9/≈0.161.45<30/4 ally+κ-ε R. MG<5%/−7%≈10%/n.d.
[ ] 202133allS7.3/≈0.351.81n.d./-κ-ω SSTMFR5.77%/n.d.n.d.
[ ] 202222 allS3/≈0.385.59n.d./-κ-ω SSTMFR3%/3% 6%/6%
[ ] 2022331S2.5/≈0.142.85<100/-κ-ω SSTMFR5%/+16% 6%/+16%

Click here to enlarge figure

2. Background Literature

3. hybrid experimental-numerical cfd method, 4. materials and methods, 4.1. benchmark pump, 4.2. sealing elements’ characterisation, 4.2.1. leakage flow test rig, 4.2.2. testing protocol, 4.2.3. sealing performance data, 4.3. axial thrust measurement, 4.3.1. experimental apparatus.

  • A first shaft (1), directly coupled to the pump shaft at one end and to a fixed transversal beam (2) by means of ball bearings (3) at the other end. The bearings are pushed against the shaft shoulder by the upper flange, which preloads them to avoid both the internal gap and the axial motion of the shaft relative to the transversal beam.
  • Strain gauge dynamometers (5), interposed between the transversal beam (2) and the fixed frame. The dynamometers are fixed to the beam (2) by means of spherical joints (plugs and swivel eyes) to minimise the parasitic loads due to the bending moment induced on the beam by any axial load exerted on the shaft.
  • A second shaft (4) interposed between the electric motor shaft, to which is directly flanged, and the first shaft, to which it transfers through a sliding joint only the mechanical torque (required by the pump operation).

4.3.2. Testing Protocol

4.3.3. axial thrust data, 6. conclusions.

  • Is able to further improve the hydraulic performance predictions achievable using pure CFD calculations in accordance with a simplified approach, which, however, allows for an accuracy comparable with that of the more demanding computational approaches suggested in the literature.
  • Permits a prediction of the head curve that fits very well the experimental data and an accuracy in the prediction of the absorbed power curve not worser than 1.5% in the entire operating range of the considered pump stage.
  • Allows for a reliable estimate of the mechanical loads on the structural components of the pump, as demonstrated by the difference obtained between the axal thrust predictions and the corresponding data measured using an original test rig, which never exceeded 14% in the considered operating range of the pump. This accuracy cannot be achieved by other CFD approaches requiring comparable computational efforts.

Author Contributions

Acknowledgments, conflicts of interest.

  • Sonawat, A.; Kim, S.; Ma, S.B.; Kim, S.J.; Lee, J.B.; Yu, M.S.; Kim, J.H. Investigation of unsteady pressure fluctuations and methods for its suppression for a double suction centrifugal pump. Energy 2022 , 252 , 124020. [ Google Scholar ] [ CrossRef ]
  • Capurso, T.; Bergamini, L.; Torresi, M. Performance analysis of double suction centrifugal pumps with a novel impeller configuration. Energy Convers. Manag. 2022 , 14 , 100227. [ Google Scholar ] [ CrossRef ]
  • Michaelides, K.V.; Tourlidakis, A.; Elder, R.L. Use of CFD for the three-dimensional hydrodynamic design of vertical diffuser pumps. In Advances of CFD in Fluid Machinery Design ; Elder, R.L., Tourlidakis, A., Yates, M., Eds.; Professional Engineering Publishing-Wiley and Sons: Bury St Edmunds, UK, 2002; pp. 129–148. [ Google Scholar ]
  • Kaupert, K.A. An Evaluation of Impeller Blade Torque During an Impeller–Diffuser Interaction. J. Fluids Eng. 2004 , 126 , 960–965. [ Google Scholar ] [ CrossRef ]
  • Roclawski, H.; Hellmann, D.H. Rotor-Stator-Interaction of a Radial Centrifugal Pump Stage with Minimum Stage Diameter. In Proceedings of the 4th WSEAS International Conference on Fluid Mechanics and Aerodynamics, Elounda, Greece, 21–23 August 2006; pp. 301–308. [ Google Scholar ]
  • Yang, C.; Cheng, X. Numerical simulation of the three-dimensional flow in a multistage centrifugal pump based on integral modeling. In Proceedings of the Power and Energy Engineering Conference, Wuhan, China, 27–31 March 2009; pp. 1–5. [ Google Scholar ] [ CrossRef ]
  • Zhou, L.; Shi, W.; Lu, W.; Hu, B.; Wu, S. Numerical Investigations and Performance Experiments of a Deep-Well Centrifugal Pump with Different Diffusers. J. Fluids Eng. 2012 , 134 , 071102. [ Google Scholar ] [ CrossRef ]
  • Shi, W.; Zhou, L.; Lu, W.; Pei, B.; Lang, T. Numerical Prediction and Performance Experiment in a Deep-well Centrifugal Pump with Different Impeller Outlet Width. Chin. J. Mech. Eng. 2013 , 26 , 46–52. [ Google Scholar ] [ CrossRef ]
  • Huang, S.; Islam, M.F.; Liu, P. CFD Analysis and Experimental Verification of Multi-stage Centrifugal Pump with Multi-outlet options. Appl. Mech. Mater. 2013 , 331 , 94–97. [ Google Scholar ] [ CrossRef ]
  • Wang, W.J.; Li, G.D.; Wang, Y.; Cui, Y.R.; Yin, G.; Peng, S. Numerical simulation and performance prediction in multi-stage submersible centrifugal pump. IOP Conf. Ser. Mater. Sci. Eng. 2013 , 52 , 032001. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Rakibuzzaman, R.; Suh, S.H.; Kim, K.W.; Kim, H.H.; Cho, M.T.; Yoon, I.S. A Study on Multistage Centrifugal Pump Performance Characteristics for Variable Speed Drive System. Procedia Eng. 2015 , 105 , 270–275. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lee, J.; Mosfeghi, M.; Hur, N.; Yoon, I.S. Flow analysis in a return channel of a multi-stage centrifugal pump. J. Mech. Sci. Technol. 2016 , 30 , 3993–4000. [ Google Scholar ] [ CrossRef ]
  • Li, W.; Jang, X.; Pang, Q.; Zhou, L.; Wang, W. Numerical simulation and performance analysis of a four-stage centrifugal pump. SAGE Adv. Mech. Eng. 2016 , 8 , 1–8. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Fontana, F.; Masi, M. CFD modelling to aid the design of steel sheet multistage pumps. In Proceedings of the 29th International Conference on Efficiency, Cost, Optimization, Simulation and Environmental Impact of Energy Systems, Portoroz, Slovenia, 19–23 June 2016. [ Google Scholar ]
  • Zhu, J.; Banjar, H.; Xia, Z.; Zhang, H.Q. CFD simulation and experimental study of oil viscosity effect on multi-stage electrical submersible pump (ESP) performance. J. Petrol. Sci. Eng. 2016 , 146 , 735–745. [ Google Scholar ] [ CrossRef ]
  • Wang, C.; Shi, W.; Wanga, X.; Jiang, X.; Yang, Y.; Li, W.; Zhou, L. Optimal design of multistage centrifugal pump based on the combined energy loss model and computational fluid dynamics. Appl. Energy 2017 , 187 , 10–26. [ Google Scholar ] [ CrossRef ]
  • Zhu, J.; Zhou, H.; Zhang, J.; Zhang, H.Q. A numerical study on flow patterns inside an electrical submersible pump (ESP) and comparison with visualization experiments. J. Petrol. Sci. Eng. 2019 , 173 , 339–350. [ Google Scholar ] [ CrossRef ]
  • Valdés, J.P.; Becerra, D.; Rozo, D.; Cediel, A.; Torres, F.; Asuaje, M.; Ratkovich, N. Comparative analysis of an electrical submersible pump’s performance handling viscous Newtonian and non-Newtonian fluids through experimental and CFD approaches. J. Petrol. Sci. Eng. 2020 , 187 , 106749. [ Google Scholar ] [ CrossRef ]
  • Yan, S.; Luo, X.; Sun, S.; Zhang, L.; Chen, S.; Feng, J. Influence of inlet gas volume fraction on energy conversion characteristics of multistage electric submersible pump. J. Petrol. Sci. Eng. 2021 , 207 , 109164. [ Google Scholar ] [ CrossRef ]
  • Kang, Y.; Su, Q.; Liu, S. On the axial thrust and hydraulic performance of a multistage lifting pump for deep-sea mining. Ocean. Eng. 2022 , 265 , 112534. [ Google Scholar ] [ CrossRef ]
  • Bai, L.; Yang, Y.; Zhou, L.; Li, Y.; Xiao, Y.; Shi, W. Optimal design and performance improvement of an electric submersible pump impeller based on Taguchi approach. Energy 2022 , 252 , 124032. [ Google Scholar ] [ CrossRef ]
  • Ha, T.W.; Lee, Y.B.; Kim, C.H. Leakage and rotordynamic analysis of a high pressure floating ring seal in the turbo pump unit of a liquid rocket engine. Tribol. Int. 2002 , 35 , 153–161. [ Google Scholar ] [ CrossRef ]
  • Adami, P.; Della Gatta, S.; Martelli, F.; Bertolazzi, L.; Maestri, D.; Marenco, G.; Piva, A. Multistage centrifugal-pumps: Assessment of a mixing plane method for CFD analysis. In Proceedings of the 60° Congresso Nazionale ATI, Roma, Italy, 13–15 September 2005. [ Google Scholar ]
  • Salvadori, S.; Marini, A.; Martelli, F. Methodology for the Residual Axial Thrust Evaluation in Multistage Centrifugal Pumps. Eng. Appl. Comput. Fluid Mech. 2012 , 6 , 271–284. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Gülich, J.F. Centrifugal Pumps , 2nd ed.; Springer: Berlin, Germany, 2010. [ Google Scholar ] [ CrossRef ]
  • Fontana, F. Estimation of the head loss in the annular chamber of multistage centrifugal pumps featuring a compact design. In Proceedings of the 1st Global Power and Propulsion Forum GPPF 2017, Zurich, Switzerland, 16–18 January 2017. [ Google Scholar ]
  • Fontana, F. Design and Analysis of Compact-Design Rotodynamic Multistage Pumps. Ph.D. Thesis, University of Padova, Padua, Italy, 15 January 2018. (In Italian). [ Google Scholar ]
  • Zhou, L.; Shi, W.; Li, W.; Agarwal, R. Numerical and Experimental Study of Axial Force and Hydraulic Performance in a Deep-Well Centrifugal Pump with Different Impeller Rear Shroud Radius. J. Fluids Eng. 2013 , 135 , 104501. [ Google Scholar ] [ CrossRef ]
RegionRegions *
SurfacesImpellerAnnular ChamberReturn Channel
EntranceMass flow inlet (I)Sliding interfaceInternal interface
ExitSliding interfaceInternal interfacePressure outlet (I)
Leakage 1 -Pressure outlet-
BladeNo slip wall-No slip wall
Leakage 2 Mass flow inlet (II)-
Side wallsNo slip wallNo slip wallNo slip wall
Azimuthal surfacesperiodicperiodicperiodic
Test
ID
Cage
Design
Ring
Size
d /D [-]
Ring
Flexibility
E [GPa]
Ring—Impeller
Shroud Disk Spacing
sp/h [-]
Shaft
Speed
N [rpm]
OpenSemiClosed0.420.52.60.40 (nom) 0.150.3302900
I XX X X X
IIX X X X X
III XX XX X
IV X X XX X
VX X XX X
VI X XX X X
VII XX X X X
VIII XX X X X
IX XX X X X
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Fontana, F.; Masi, M. A Hybrid Experimental-Numerical Method to Support the Design of Multistage Pumps. Energies 2023 , 16 , 4637. https://doi.org/10.3390/en16124637

Fontana F, Masi M. A Hybrid Experimental-Numerical Method to Support the Design of Multistage Pumps. Energies . 2023; 16(12):4637. https://doi.org/10.3390/en16124637

Fontana, Federico, and Massimo Masi. 2023. "A Hybrid Experimental-Numerical Method to Support the Design of Multistage Pumps" Energies 16, no. 12: 4637. https://doi.org/10.3390/en16124637

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Acoustical Society of America

  • Previous Article
  • Next Article

Numerical and experimental validation of a hybrid finite element-statistical energy analysis method

Author to whom correspondence should be addressed. Electronic mail: [email protected]

  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Vincent Cotoni , Phil Shorter , Robin Langley; Numerical and experimental validation of a hybrid finite element-statistical energy analysis method. J. Acoust. Soc. Am. 1 July 2007; 122 (1): 259–270. https://doi.org/10.1121/1.2739420

Download citation file:

  • Ris (Zotero)
  • Reference Manager

The finite element (FE) and statistical energy analysis (SEA) methods have, respectively, high and low frequency limitations and there is therefore a broad class of “mid-frequency” vibro-acoustic problems that are not suited to either FE or SEA. A hybrid method combining FE and SEA was recently presented for predicting the steady-state response of vibro-acoustic systems with uncertain properties. The subsystems with long wavelength behavior are modeled deterministically with FE, while the subsystems with short wavelength behavior are modeled statistically with SEA. The method yields the ensemble average response of the system where the uncertainty is confined in the SEA subsystems. This paper briefly summarizes the theory behind the method and presents a number of detailed numerical and experimental validation examples for structure-borne noise transmission.

Sign in via your Institution

Citing articles via.

  • Online ISSN 1520-8524
  • Print ISSN 0001-4966
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 August 2024

Experimental insights into the stability of graphene oxide nanosheet and polymer hybrid coupled by ANOVA statistical analysis

  • M. Iravani 1 ,
  • M. Simjoo 1 ,
  • M. Chahardowli 1 &
  • A. Rezvani Moghaddam 2  

Scientific Reports volume  14 , Article number:  18448 ( 2024 ) Cite this article

Metrics details

  • Nanoparticles
  • Synthesis of graphene

The synergistic potential of using graphene oxide (GO) nanosheets and hydrolyzed polyacrylamide (HPAM) as GO enhanced polymer hybrid (GOeP) for enhancing oil recovery (EOR) purposes has drawn attention. However, the hybridization method and stability of GOeP have not been comprehensively studied. To cover this gap, the current study evaluates the stability of GOeP under different conditions, including temperatures such as 60 and 80 °C, high and low salinities, and the presence of Mg 2+ ions (6430 and 643 ppm). Hence, GO nanosheets were synthesized and characterized through XRD, Raman, FTIR, and DLS techniques. The performance of five preparation methods was assessed to determine their ability to produce stable hybrids. Zeta potential and sedimentation methods, coupled with the ANOVA statistical technique, were used for measuring and interpreting stability for 21 days. Results revealed that the stability of GOeP in the presence of brine is influenced by hydrolyzation duration, the composition of the water used in polymer hydrolyzation, the form of additives (being powdery or in aqueous solution), and the dispersion quality, including whether the GO solution was prediluted. The results revealed that the positive impact of higher temperatures on the long-term stability of GOeP is approximately seven times less significant than the reduction in stability caused by salinity. Under elevated salinity conditions, a higher Mg 2+ concentration led to an 80% decrease in long-term stability, whereas the temperature impact was negligible. These findings highlight the potential of GOeP for EOR applications, offering insights into optimizing stability under challenging reservoir conditions.

Similar content being viewed by others

experimental method hybrid

Assessment of modified chitosan composite in acidic reservoirs through pilot and field-scale simulation studies

experimental method hybrid

A comprehensive review of viscoelastic polymer flooding in sandstone and carbonate rocks

experimental method hybrid

Low-salinity water flooding by a novel hybrid of nano γ-Al2O3/SiO2 modified with a green surfactant for enhanced oil recovery

Introduction.

To date, fossil fuels have been the most important source of energy in the world, and oil production plays a vital role in providing a sustainable energy supply 1 , 2 , 3 . With oil production, the energy of the oil reservoir decreases, and gradually, the potential of natural production decreases 4 . Therefore, EOR methods are used to increase oil production 5 , 6 .

Polymer flooding is one of the most common EOR methods, generally resulting in higher oil recovery than water flooding 5 , 7 . However, for a long, the method has faced challenges like viscosity loss and polymer degradation at high temperatures and salinity, known as harsh conditions, because the polymers used for EOR purposes in the oil industry (especially HPAM, which is the most common polymer used to improve oil recovery) are sensitive to high salinity, presence of divalent ions and temperature and may not have the necessary efficiency 8 , 9 , 10 .

According to previous studies, in solutions with high salinity, thickness reduction of the electric double layer (EDL) occurs due to the interaction and bonding between functional groups (which have negative charges) of polymer molecules and the cations present in the solution, leading to viscosity reduction 11 , 12 . In addition to salinity, the viscosity of a polymer solution is also influenced by the hardness value of the solution, and the presence of divalent ions has a negative effect on polymer flooding performance 13 , 14 . In the presence of divalent cations such as calcium and magnesium, stronger bonds are formed between them and the carboxyl functional group of HPAM. This bond leads to the deformation and coiling of polymer molecules and, accordingly, a significant reduction in viscosity and may even result in polymer precipitation 15 . In addition to salinity and hardness, temperature is also a crucial factor affecting the performance of polymer flooding 16 . An increase in temperature accelerates the speed of polymer hydrolysis and, consequently, reduces viscosity and possible sedimentation 17 . Studies have shown that adding nanoparticles to the polymer can enhance its performance, improving properties such as viscosity, rheological behavior, chemical resistance, and thermal stability 18 , 19 , 20 , 21 .

Therefore, to improve polymer flooding performance in such harsh conditions, the effect of different nanoparticles has been studied 9 , 22 , 23 , 24 , 25 , 26 . The most studied nanoparticles in the oil industry are carbon nanotubes 27 , 28 , carbon nanofibers 29 , nano oxides, and clay nanoparticles 30 , 31 , 32 , 33 , 34 . In the current study, the hybridization of nanoparticles and polymer is referred to as Nanoparticle-Enhanced Polymer hybrid (NEP).

In the last two decades, carbon nanoparticles, particularly graphene oxide (GO) nanosheets, have drawn significant attention 35 . GO is produced by oxidizing graphite followed by exfoliation. The schematic illustration in Fig.  1 provides a schematic molecular view of GO and graphite. GO and its derivatives have received more attention in industries such as biomedicine 36 , electronics 37 , and material science 38 , but GO has not received the necessary attention in the oil industry, while researchers believe that these nanoparticles can have good performance in the upstream oil industry like drilling 39 , 40 , 41 , enhancing oil recovery 42 , 43 , 44 , 45 , 46 , 47 and precision tools 48 , 49 . Therefore, the hybridization of polymer and GO would be the first step for comprehensive investigations.

figure 1

Schematic molecular view of GO and Graphite.

Researchers have proposed different methods for preparing NEP based on the type of nanoparticles and the properties of polymers. In general, nanoparticle stabilization methods are categorized as physical and chemical methods 50 . Chemical methods use a second chemical component known as a chemical stabilizer, like an alkaline or acidic substance, to reach stabilization. Therefore, it can be expected that the properties of nanoparticles or, in the case of NEP hybrid, the polymer properties would be changed because base or acid substances can affect the properties and molecular shape of the polymer and change its properties considerably 51 . On the contrary, using physical methods, stabilization occurs only by using physical energy transfer and using devices such as homogenizers, sonication, and stirrers. Therefore, it is critical to find the best method to prepare an NEP hybrid for specific nanoparticles and polymers.

In most of the studies in which SiO 2 (as the most widely used nanoparticle oil industry) has been used for NEP hybrid preparation, in the presence and absence of salinity, the use of sonication alone was sufficient to stabilize the solution, and almost all additives (especially polymer) were powdery. Hu et al. used the combination of HPAM and SiO 2 to prepare an NEP hybrid. They first dispersed the nanoparticles in deionized water (DIW) for 20 min using probe sonication, then added the polymer powder, and after stirring for 24 h, added salt 52 . Elhaei et al. took a different method; they first prepared a brine solution. They then dispersed the surface-modified SiO 2 nanoparticles in the brine solution for 70 min using a probe sonication, and then the HPAM powder was added to it using a magnetic stirrer 53 . Santamaria et al. also prepared a NEP hybrid without using brine. In this regard, they first sonicated the hydrophilic fumed SiO 2 nanoparticle for 30 min. They then added the polymer powder and rested the solution for 48 h until it was fully hydrated, and the nanoparticle-enhanced polymer solution became stable 54 . Aliabadian et al. employed a strategy comparable to Santamaria et al. however they considered a shorter rest time for the NEP hybrid (24 h) 18 .

Developing enhanced polymer solutions with clay nanoparticles presents unique challenges compared to other types of nanoparticles. Unlike other nanoparticles, such as silica, clay nanoparticles are composed of charged plates that increase the interactions between particles, making the preparation method more complicated. The increased interactions can lead to aggregation and loss of stability, which must be carefully considered and addressed to ensure the effectiveness of the resulting polymer solutions. Therefore, it is crucial to consider these factors when working with clay nanoparticles to develop advanced materials with enhanced properties. Rezaei et al. used Surface Modified Clay Nanoparticles (SMCN) and HPAM to prepare the NEP hybrid. For this purpose, the nanoparticles were dispersed in distilled water using a sonication for one hour. Then, the polymer powder was added to it using a magnetic stirrer and stirred for 72 h, and the final solution was kept away from light and heat for a week 55 . Kumar et al. also used the combination of CuO nanoparticles, nano-clay, and HPAM to prepare the NEP hybrid. Hence, nanoparticles were first dispersed in DIW using sonication for 8 h at a temperature of 60 °C. Then, to add the polymer, a brine solution was first prepared, and the polymer was added separately to the brine solution and stirred for 16 h using a magnetic stirrer. Finally, the polymer solution prepared in brine solution was added to the nanoparticle aqueous solution 56 .

Recently efforts have been made to prepare and use GO enhanced polymer hybrid (GOeP) for EOR purposes. Hybridizing GO and HPAM results in the interaction of GO functional groups with HPAM. Haruna et al. used HPAM and GO nanosheets to prepare an NEP hybrid. They first prepared polymer solution in distilled water and added graphene oxide nanosheet suspension dropwise while stirring the solution using a magnetic stirrer at low speed for 24 h. Then, the previously prepared brine solution was added to the GOeP hybrid and placed on a magnetic stirrer for 24 h 57 . Lasheri et al. prepared a GOeP hybrid similar to Haruna et al.’s method, except they rested the polymer solution for 24 h to hydrolyze it entirely in the aqueous environment 58 . Kumar et al. suggested a method to prepare GOeP hybrid by adding the polymer to DIW, then adding brine solution and GO solution prepared in DIW using a probe sonication. The final saline GO-enhanced polymer (sGOeP) hybrid was obtained by mixing DIW, brine solution, and GOeP hybrid 59 . Vasconselos et al. presented a different method to prepare HPAM polymer solution enhanced with GO nanosheets. They suggested preparing a thick polymer solution in DIW, then diluting some of the polymer solution and adding the GO nanosheets to the diluted polymer solution, which was dispersed using a probe sonication. The diluted polymer solution enhanced with GO was mixed with the thick polymer solution, and finally, mixing DIW, brine solution, and GOeP hybrid resulted in the final sGOeP hybrid 60 .

Hybridization provides an opportunity to merge the distinct properties of both materials, leading to the development of novel and enhanced materials with superior properties. By exploring the behavior and attributes of GOeP hybrids, researchers can advance their understanding of the potential applications of these materials in the oil industry. Specifically, they can be applied in critical areas such as drilling operations and endeavors for EOR processes.

Hybridizing polymer and GO is a crucial first step in exploring their potential for EOR purposes, and the stability of sGOeP is a vital parameter for designing and applying EOR methods. To our knowledge, few studies have addressed the effect of salinity, temperature, and the presence of divalent ions on the stability of sGOeP. Nevertheless, the methodology for making sGOeP hybrids has not been comprehensively explored in previous studies. This study seeks to fill this gap by investigating five methods for preparing sGOeP hybrids and examining the different preparation methods on the hybrid stability aided by FTIR and zeta potential analysis. To achieve this, GO nanosheets were synthesized and characterized, followed by an investigation of different methods for preparing sGOeP hybrids. The stability of the hybrid solution was also investigated in terms of salinity, the presence of divalent ions, and temperature. All the results were analyzed using ANOVA (Analysis of Variance) to distinguish between significant and insignificant parameters and compare the effect of each parameter. Besides, using the ANOVA enables the statistical quantification of the effect of each parameter.

Graphene oxide

To synthesize GO, chemicals including sulfuric acid with 95% purity, sodium hydroxide, potassium permanganate, and sodium hydrate, supplied by Merck, along with graphite from Asbury Company, were used. Also, to prepare the GO solution (GOS), the proper amount of GO was mixed with DIW to form the final concentration of 100 ppm.

The utilized polymer was the powder of “Flopaam 3630S” from SNF, HPAM, dissolved in DIW to form a final concentration of 1500 ppm.

The brine is prepared by mixing the proper amount of different salts with DIW. All salts were obtained from Sigma-Aldrich with a purity of approximately 99.0%. To evaluate the effect of salinity and the presence of divalent ions, different brine compositions were used. The brine compositions were formulated to maintain a constant ionic strength (IS) of solutions at high salinities. For example, the IS of a brine composed of NaCl at high salinity was equal to that of a brine containing NaCl and MgCl 2 . This approach ensured that any observed differences in stability could be explicitly attributed to the presence of Mg 2+ ions rather than to variations in the overall IS.

Maintaining constant IS is crucial because it influences the electrostatic interactions between ions in solution, which can significantly affect the stability of colloidal systems. Electrostatic interactions are a critical factor in determining the behavior of colloids, as they govern the repulsive forces that prevent particle aggregation. When the IS of a solution increases, the repulsion between particles decreases, potentially leading to aggregation 61 , 62 .

By keeping the IS constant while varying the concentration of Mg 2+ ions, the specific effects of divalent ions on the stability of the hybrid solution are isolated. This method is grounded in established scientific practices, which emphasize the importance of controlling IS to study colloidal stability 63 , 64 . For instance, Verwey and Overbeek’s DLVO theory highlights how IS modulates van der Waals attraction and electrostatic repulsion, critical factors for colloid stability. Table 1 lists the compositions of different brine solutions used in the current study, detailing the concentrations of salts to maintain the desired IS.

Graphene oxide synthesis

The method for synthesizing graphene oxide involves several steps, as mentioned in Ref. [ 65 ]. Firstly, one gram of intercalated graphite was heated in an oven at 900 °C for less than a minute to expand the graphite plates and increase the distance between its sheets. Next, 20 ml of sulfuric acid and two grams of potassium permanganate were mixed using an ice bath. The temperature was then raised to 40 °C, and the expanded graphite was added to the solution and stirred for 2 h using a magnetic stirrer to oxidize the graphite. The resulting paste-like liquid was mixed with 100 ml of cold water and 5 ml of hydrogen peroxide to neutralize the reaction and adjust the pH. The produced material is washed 4–6 times using sodium hydroxide and a high-speed centrifuge to remove impurities. The washed graphite oxide was exfoliated using a high-shear homogenizer. Finally, probe and bath sonication were employed to disperse GO nanosheets and further stabilize them. A schematic view of the synthesis process is shown in Fig.  2 . The final product of this step is a GO aqueous dispersion.

figure 2

Schematic view of GO synthesis process.

Characterization methods

The synthesized GO is characterized by various analyses, including X-ray diffraction, Fourier transform infrared spectroscopy, Raman spectrometry, dynamic light scattering, and zeta potential. In this section, each analysis procedure will be comprehensively described.

X-ray diffraction analysis

X-ray diffraction (XRD) analysis is widely used for identifying compounds and phases in materials and is a subset of spectroscopic analyses. As the XRD test uses powder samples, the GO solution was first centrifuged at 12,000 rpm for 2 h, resulting in GO sludge. The sludge was then dried at 50 °C for 24 h. Then, GO powder was tested with a PHILIPS PW1730 X-ray diffractometer.

Fourier transform infrared spectrometer analysis

Fourier transform infrared spectroscopy (FTIR) analysis is one of the most widely used methods for identifying compounds and bonds in organic and inorganic substances and is a subset of spectroscopic analyses. The chemical bonds, molecular interactions, and especially the type of functional groups of materials can be identified by studying the infrared spectrum of the device’s output. This test was conducted using the Thermo_Avatar spectrometer.

Raman spectrometry analysis

Raman spectrometry is widely used for identifying molecules, evaluating their structure, and determining chemical bonds. It complements FTIR analysis and is quick and inexpensive for investigating carbon structures like nanotubes, fluorine, and graphene. The UniDRON—UniNanoTech Confocal Microscopy and Raman Spectroscopy device was used.

Dynamic light scattering and zeta potential analysis

Dynamic light scattering (DLS) is a physical method used to determine particle distribution and size in solutions and suspensions, and it is used in various applications like medicine, biology, and cosmetics. It measures particle size distribution between nanometers and 10 µm, calculating their hydrodynamic diameter over time. Zeta potential (ZP) analysis is essential for assessing the electric charge of particles, which is critical for diverse applications. Higher charges, whether positive or negative, contribute to the stability of particles in liquids. Consequently, ZP analysis holds considerable importance in nanoparticle research. To conduct ZP and DLS analysis, the MALVERN Nano ZS ZEN 3600 instrument was used, and each sample was diluted 50 times and tested three times.

Preparation of polymer solution

A proper amount of HPAM powder was carefully added to makeup water using a magnetic stirrer to create the polymer solution. While adjusting the stirring speed to create a vortex encompassing approximately 75% of the solution height, particular attention was paid to adding the HPAM powder at the two-thirds height of the vortex to ensure optimal mixing. While the makeup water was stirring, the polymer was gradually added for 60 s to facilitate thorough incorporation. In order to minimize agitation and ensure complete hydration and the creation of a homogenous solution, the stirring rate was kept low at 60–80 rpm (corresponding to low shear rates). It is important to note that any detection of fish eye indicated a deviation from the intended homogeneity, requiring the immediate termination of the procedure and the restart of polymer solution preparation. Furthermore, the mixing and resting time differed depending on the basis and purpose of each GOeP preparation process, highlighting the significance of method-specific factors in getting the desired results.

Preparation of graphene oxide enhanced polymer hybrid

In order to explore methodologies for preparing sGOeP hybrids, with a particular focus on nanosheets exhibiting properties similar to GO, four distinct methods were selected, along with the introduction of method 5. The performance of these methods in stabilizing the resultant hybrid solution, comprising polymer, GO, and HS brine, herein referred to as HS-GOeP hybrid, was evaluated. In each method, the preparation of the polymer solution follows the procedure detailed in “Preparation of polymer solution” section, with the corresponding mixing and potential resting times as indicated.

In Method 1, the polymer solution is prepared with a mixing time of 48 h. Following this preparation, the GO dispersion is incrementally added dropwise to the polymer solution, followed by the addition of salt, with each addition being stirred for 2 h. Method 2 involves the creation of a brine solution using DIW and salt, serving as the makeup water for the preparation of the polymer solution. This solution undergoes a mixing period of 48 h before the gradual addition of GO dispersion dropwise into the mixture. Method 3 involves the preparation of the GO dispersion utilizing DIW and the addition of HPAM powder, following the guidelines outlined in “ Preparation of polymer solution ” section, to create the GOeP hybrid. As in previous methods, the mixing period lasts for 48 h. Subsequently, the brine solution is separately prepared and introduced into the GOeP hybrid, followed by a stirring period of 2 h. It is noteworthy to mention that the first three procedures are referred to as “classic methods” in the subsequent sections because they were widely used for various nanoparticles. However, method four has been suggested for preparing GOeP hybrids in the last few years. In Method 4, the polymer solution is prepared and allowed to rest for a day, under conditions avoiding exposure to heat and light, before the gradual addition of GO dispersion dropwise, followed by the brine solution. In this method, the mixing period lasted 24 h, with the solution stirred continuously for 24 h at each step. Method 5 is similar to Method 4 but involves adding the diluted GO dispersion to the polymer solution dropwise and stirring for 24 h. Previously made brine solution is then added to one-third of the GOeP hybrid and stirred for 2 h, resulting in a semi-hybrid solution. Finally, the semi-hybrid solution is added to the remaining GOeP hybrid and stirred for two more hours. Figure  3 shows a schematic representation of the five methods to prepare the hybrid solution.

figure 3

Schematic view of 5 different methods used to prepare sGOeP hybrid.

Stability analysis of graphene oxide enhanced polymer hybrid

Two methods were used to check the stability of the sGOeP hybrids: Sedimentation and measurement of ZP. The sedimentation method is the most commonly used method for analyzing the stability of nanoparticle solutions. In this method, the distance or color difference between the sedimentation and non-sedimentation zones is measured with the naked eye over some time while the sample is kept static in a container 66 , 67 , 68 . If a distance or color difference is observed, the sample is labeled “unstable.” However, this method cannot provide a numerical value for further comparisons. Therefore, if no distance or color difference is observed, the solution is checked using ZP analysis, an accepted method, for stability analysis of nanoparticle suspensions. Larger absolute values of ZP indicate a more stable nanoparticle suspension 69 .

The area under the curve (AUC) of ZP vs. time was proposed as a quantitative metric for determining long-term stability. The greater the absolute value of AUC, the greater the solution’s long-term stability. The AUC provides an overall assessment of how the ZP evolves. In simpler terms, the AUC for ZP vs. time represents the accumulated change in ZP throughout the entire duration of the experiment. It includes both the magnitude and direction of ZP shifts over time. Therefore, AUC makes it possible to compare the long-term stability of the solutions. The AUC of each solution was calculated using the well-established approach of Simpson’s integration rule.

ANOVA statistical analysis

ANOVA is an essential statistical approach to ascertain if there are any statistically significant differences between the means. Many fields of study, including engineering, chemistry, psychology, business, education, and medicine, have adopted it widely. ANOVA’s importance lies in its ability to identify patterns and relationships within data, enabling researchers and decision-makers to draw meaningful conclusions and make informed choices. When testing a hypothesis, ANOVA is essential, especially when comparing the results of various treatments or interventions. ANOVA assists in determining whether the observed differences are genuinely impacted by the independent variable or result from chance by evaluating variance within and across groups. This information is precious for researchers to assess how well certain factors work. The adaptability of ANOVA is not limited to hypothesis testing; it may also be applied in exploratory data analysis and descriptive statistics. By partitioning the total variance into its constituent components, ANOVA provides a detailed breakdown of the variation within and between groups, offering insights into the underlying structure of the data. With this information, outliers may be identified, potential relationships can be investigated, and future study directions can be determined. The ANOVA in the current study was conducted using the “statsmodels” module in the Python programming language after obtaining the AUC.

The F-statistic, P -value, and effect size are essential measures in ANOVA analysis. The P -value, typically below 0.05, indicates the probability of chance differences, whereas the F-statistic indicates the variability difference across groups. A lower P -value suggests a more substantial difference between the group means, strengthening the analysis’s robustness. The magnitude of the difference between group averages is quantified by the effect size, which sheds light on the reported effects’ practical importance. Together, these statistical measures collaboratively contribute to a comprehensive understanding of the relationships between variables in ANOVA analysis.

Results and discussion

It is noteworthy that to explore the performance of stabilizing methods, the experiments were conducted at room temperature, and for further investigations, 60 and 80 °C were chosen.

Graphene oxide characterization

The XRD pattern of GO, measured in the range of 10°–80°, is shown in Fig.  4 (left). GO structure has a prominent peak around 2θ value of 11 degrees, which shows the formation of GO. Also, two other peaks are seen in the XRD pattern at 2θ values of 24.45 and 34.4. Using these values, the distance between layers (d) could be evaluated using Bragg’s equation. The d for 24.45 and 34.4 peaks is 3.63 nm and 2.60 nm, respectively 66 , 70 . These observations are consistent with the results of Haruna et al. The broadening and extension of the 24.45° peak can be attributed to the disruption of the crystalline structure of graphite and graphite oxide, as well as the restacking of graphene sheets 71 , 72 . This broadening suggests a loss of crystallinity and an increase in layer disorder, further substantiated by the lack of significant peak shifts, broadening, or splitting in the Raman spectra.

figure 4

XRD pattern (left) and Raman spectrum (right) of synthesized GO.

The Raman spectrum of GO is presented in Fig.  4 (right). Accordingly, two main peaks at 1342 cm −1 and 1568 cm −1 demonstrate D and G peaks, respectively. It is believed that the G peak is typical of all sp 2 hybridized carbon bonds (C=C) referring to E2g mode. In other words, the G peak is a primary in-plane vibration. In comparison, the D peak is linked with structural defects mainly formed by oxygen-containing functional groups 18 . The absence of significant alterations in these Raman peaks suggests uniformity in the synthesized GO, as non-uniform GO would exhibit noticeable changes in these peaks 73 , 74 . Previous studies also mentioned that the ratio of the D and G bands (I D /I G ) can be used as a parameter for the functionalization of GO 75 . For the synthesized GO, I D /I G is 1.02, which shows that graphene is functionalized (oxidized). Furthermore, using the ratio of the G and D bands (I G /I D ) and laser line wavelength (λ laser ), the in-plane size (La) can be estimated using the following Equation 76 :

Considering λ laser  = 532 nm, I G /I D  = 0.98, the La can be evaluated as 18.84 nm.

FTIR spectrum (Fig.  5 ) accorded well with other studies and confirmed the formation of GO 70 , 77 , 78 , 79 . FTIR spectrum shows the various chemical bond configurations such as C=C in-plane vibration (sp 2 —hybrid) at 1500–1600 cm −1 , carboxyl (COOH) at 1650–1750 cm −1 , including C–OH vibration at 3435 cm −1 , and C–O at 1072 cm −1 , ketonic species (C=O) at 1600–1650 cm −1 and 1750–3800 cm −1 and hydroxyl (C–OH) at 3050–3800 and 1070 cm −1 , respectively.

figure 5

FTIR spectrum of synthesized GO.

Also, the hydrodynamic diameter of GO nanosheets was measured using DLS analysis, revealing a mean diameter of 286.9 nm. Furthermore, GO aqueous dispersion was prepared using probe sonication. The stability was investigated by measuring ZP, which was found to be − 20.55 mV, showing that GO is stable in DIW (Fig.  6 ).

figure 6

ZP result of GO at room temperature.

Stability analysis

Selecting the hybridization method.

To evaluate the efficiency of the stabilizing methods, the GOeP makeup solution with HS brine composition was chosen as the base case (HS-GOeP), and all methods were first assessed at room temperature. Unlike previous studies, using methods 1–3 for the preparation of HS-GOeP was unsuccessful, with instability evident after 12 h, 10 min, and 30 min, respectively. Figure  7 shows the ×2 magnification of the NEP hybrid prepared by methods 1–3.

figure 7

×2 magnification of the NEP hybrid prepared by methods 1–3.

HS-GOeP prepared by method 1 took longer to become unstable than the other two methods, likely due to a longer polymer hydrolyzation duration. In the method, the polymer had sufficient time to hydrolyze completely before reacting with GO nanosheets. On the contrary, in method 2, as the polymer is added to the brine solution, polymer molecules react with the salt ions first. When GO was added to the solution, fewer polymer molecules were available to react. As a result, GO reacted with free ions, aggregated/agglomerated, and finally precipitated quickly. This highlights the importance of water composition used for polymer preparation on the stability of hybrid. The HS-GOeP prepared using method 3 showed a different performance in which polymer molecules did not have enough time to hydrolyze and react with GO nanosheets.

Method 4 was also unsuccessful in stabilizing the HS-GOeP hybrid. In the initial stages after the method was ended, no instability was observed by the sedimentation method, but after a day, a few little agglomerations appeared. The hybrid suspension was evaluated using the ZP instrument, revealing a value of − 2.34 mV. As ZP values near 0 are considered highly unstable, the hybrid suspension was found to be unstable. It is believed that adding GO to polymer with no previous GO dilution could not prepare homogenous GO in polymer dispersion, so the heterogeneous dispersion causes early-stage stability (seen by sedimentation stability method) and leads to final instability (measured by ZP). Therefore, the dispersion quality (pre-dilution) could affect the stability of the hybrid. Additionally, comparing methods 1 and 4 reveals that the form of additives (whether powdery or in aqueous dispersion) also affects the hybrid’s stability. In method 1, the salt was added in powder form, which resulted in instability.

In reviewing classic methods and method 4 to eliminate parameters negatively affecting hybrid stability, method 5 was designed. In the method, the polymer had enough time to be hydrolyzed, the polymer was prepared in pure DIW to avoid miss-hydrolyzation and side reactions with other components, the polymer solution was kept away from light and heat, other additives were mixed in an aqueous phase, and the order of mixing components was selected in a way to avoid unfavorable reactions (like the reaction of the polymer with ions before reaction with nanoparticles). Therefore, method 5 was used to prepare HS-GOeP, and its stability was investigated. The sedimentation method showed no color difference or precipitation, so the prepared hybrid was evaluated using the ZP and DLS methods. The results of ZP and DLS analysis of HS-GOeP, GOeP, pure GO aqueous suspension (GOS), and polymer solution are given in Table 2 .

The negative ZP results show that the hybrid solutions are stable. Also, ZP values indicated stronger interactions between the polymer and GO in the GOeP hybrid, suggesting improved dispersion stabilization 66 . Furthermore, in the case of a GOeP, a higher absolute ZP value may imply that the functional groups on the surface of GO and HPAM polymer interact more strongly with each other than when they are in their state. Because of increased electrostatic attraction or chemical bonding between the components, the interaction between the GO and HPAM may result in a more stable nanocomposite structure.

It was also discovered that adding brine to the hybrid system decreases the ZP absolute value, yet this ZP reduction did not lead to instability. When the size of the particles in different solutions was considered, it was found that in hybrid solutions, the size of the particles increased with the reduction in zeta potential, which correlated to particle agglomeration and reduced stability.

In addition to ZP analysis, FTIR analysis was used to ensure the interaction between polymer and GO. Figure  8 shows the FTIR analysis result of GOeP alongside GO and HPAM. According to the main peaks of the HPAM FTIR spectrum, there is C–O–C stretching vibration of the ether group (–O–) at 1000–1200 cm −1 , C=O stretching vibration of the carboxyl group (–COOH) at 1600–1800 cm −1 , –CH2 stretching vibration of the methylene group (–CH2–) at 2800–3000 cm −1 and N–H stretching vibration of the amide group (–CONH2) at 3200–3500 cm −1 . Additionally, the FTIR spectra of GOeP indicate hydrogen bonding between the carboxyl functional group of GO and the amide functional group of HPAM due to the C=O stretching vibration peak at 1634.71 cm −1 and the N–H stretching vibration peak at 3447.05 cm −1 .

figure 8

FTIR spectrum of HPAM, GO and GOeP.

Long-term stability analysis

In this section, the effect of different parameters, including salinity, temperature, and the presence of divalent ions (Mg 2+ ), is investigated. In this respect, the ZP analysis of different hybrid solutions was done at two temperatures and during time. So, the temperature was set to 60 and 80 °C.

It was observed that the stability and long-term stability of GOS are not highly affected by temperature, and the destabilization behavior of GOS is almost similar at 60 and 80 °C. GOeP showed higher long-term stability in comparison with GOS, which is due to the bonding between polymer and GO molecules. When GO is hybridized with HPAM, the two materials can form various bonds, including covalent bonds, hydrogen bonds, and van der Waals forces 57 , 80 . In the case of GOeP hybrid, covalent bonds can form between functional groups on the surface of GO and HPAM chains. Hydrogen bonds can also form between oxygen-containing functional groups on GO and amide groups on HPAM and water molecules 81 . Van der Waals forces can contribute to the interaction between the hydrophobic regions of GO and the hydrophobic portions of HPAM. A schematic illustration of the interaction between GO and polymer molecules is shown in Fig.  9 .

figure 9

Schematic illustration of the interaction between GO and polymer molecules.

Table 3 presents the ANOVA results investigating the effects of polymer presence or absence, as well as designated high (80 °C) and low (60 °C) temperatures, on the stability of GOS. As seen, temperature, polymer presence, and their interaction have a significant effect ( P  < 0.05) on the long-term stability of the GOS solution. Furthermore, based on the percentage of the effects, it can be concluded that the presence of polymer has the most significant effect (89.804%), followed by temperature (6.692%), with their interaction being relatively small but still significant (3.332%). Additionally, statistically, the effect of polymer on the long-term stability of the solution is approximately 12.5 times greater than the effect of temperature. Figure  10 displays the main effects plot of AUC for polymer and temperature. As can be seen, the presence of polymer significantly enhances solution stability, while temperature has a comparatively minor effect. Additionally, the results suggest that the presence of polymer and increasing temperature enhance the long-term stability of the hybrid solution.

figure 10

Main effects plot (fitted means) of AUC for polymer and temperature.

Figure  11 shows the effect of temperature and polymer addition to GOS. Generally, increasing the temperature can increase the rate of chemical reactions and the mobility of molecules, promoting the bonding between the two materials. In the case of GOeP bonding, higher temperatures can increase the rate of hydrogen bonding between oxygen-containing functional groups on GO and amide groups on HPAM. This is because hydrogen bonding is a temperature-dependent process, and higher temperatures can promote the formation of hydrogen bonds by increasing the kinetic energy of molecules, which results in enhancing stability as indicated by increasing absolute ZP value by temperature (from room temperature to 80 °C) and FTIR results of GOeP (Fig.  9 ). Also, one can observe that at early times (till day 7), the stability of GOeP is increased and then decreases. This could be attributed to the formation of strong hydrogen bonds between GO and HPAM molecules at early times, followed by a decrease in GOeP stability due to the aggregation of GO sheets. The GO sheets may aggregate due to Van der Waals forces between them 58 , which can reduce the available surface area for interaction with the polymer chains and cause a subsequent stability reduction.

figure 11

ZP result for GOS and GOeP at 60 and 80 °C.

Table 4 provides a comprehensive overview of the ANOVA results, shedding light on the pivotal role of temperature and salinity in shaping the stability of the GOeP hybrid solutions. The statistical analysis underscores the significant impact of both temperature and salinity, along with their interaction, on the GOeP solution’s stability ( P  < 0.05). Delving into the effect percentages, salinity emerges as the primary driver, contributing a substantial 80.998% to the variability in stability. Simultaneously, temperature wields significant influence, accounting for 11.373% of the observed variance. However, the interaction effect of temperature and salinity is comparatively minor at 4.599%. Notably, the statistical assessment reveals that the effect of salinity is approximately 7.12 times more pronounced than the effect of temperature on the stability of the GOS solution. These findings underscore the critical interplay of salinity and temperature, with salinity emerging as the predominant factor governing the stability dynamics of the GOeP solution in this experimental context.

The main effects plot of AUC for polymer and temperature is illustrated in Fig.  12 . The results indicate that salinity has a drastically high negative impact on long-term stability, while higher temperatures contribute to improved stability.

figure 12

Main effects plot (fitted means) of AUC for salinity and temperature.

Figure  13 shows the behavior of ZP variation during the time for HS-GOeP and LS-GOeP prepared with high (HS) and low (LS) salinity brines (NaCl) at 60 and 80 °C. While HS-GOeP exhibited initial stability (lasting 18–19 days at both temperatures), its long-term stability suffered due to the salting-out effect.

figure 13

ZP results for HS-GOeP and LS-GOeP at 60 and 80 °C.

Salinity acts like a double-edged sword for GOeP stability. It was evident that at high salinities, the stability decreases due to the salting-out effect. In this regard, at high concentrations of NaCl, the water and salt molecules interact strongly, reducing GO interaction with polymer chains and mainly hydrogen bonds between them, aggregation of GO sheets, and destabilizing the hybrid. In other words, water and salt molecules compete for interaction with the polymer, weakening the crucial hydrogen bonds with GO sheets. Consequently, the electrostatic repulsion between individual GO sheets, which helps maintain their separation, is diminished. This distribution in the dispersion of GO leads to aggregation and ultimately reduces the stability of the hybrid. Furthermore, high NaCl concentrations can lead to polymer chain instability and aggregation. Increasing the salt concentration can also alter both the IS and the dielectric constant of the solution, which will affect the conformation and stability of the polymer chains. The destabilization of the HS-GOeP hybrid at both 60 °C and 80 °C after 18 days suggests that the added NaCl brine has a significant impact on the stability of the hybrid and that the salting-out effect dominates over any other effect. LS-GOeP showed better stability than HS-GOeP, which can be attributed to a less salting-out effect. Just as in HS-GOeP, in the case of LS-GOeP, the salt-out effect is still active, but there is less salt in the hybrid; hence, there would be more available surface area (or less competition) for GO interaction with the polymer chains. This allows for more effective hydrogen bonding and stronger interfacial interactions between GO and the polymer, leading to the stabilization of the hybrid.

Furthermore, comparing the stability of LS-GOeP at 60 and 80 °C reveals exciting results. LS-GOeP remained stable after 21 days at 80 °C, unlike 60 °C, which could be due to the counteracting effect of higher temperature against the salting out effect. Increasing temperature can lead to better dispersion of GO sheets in polymer solutions due to increased kinetic energy overcoming intermolecular forces that hold them together, allowing them to break apart and disperse more readily in the polymer solution. Additionally, at higher temperatures, the molecular motion and collisions would be increased, leading to better dispersion of GO nanosheets in the polymer solution, promoting the formation of strong interactions between GO and the polymer chains.

The effect of divalent ions at high salinity is studied using HS-MgTI and HS-MgEn brine as makeup water for GOeP hybrid preparation. As mentioned, HS-MgTI is a brine solution of NaCl with a low concentration of MgCl 2 , whose IS is equal to HS brine. The ANOVA results presented in Table 5 provide insights into the factors influencing the stability of the GOeP at high salinity. The statistical analysis emphasizes the substantial impact of Mg 2+ concentration on HS-GOeP stability, with a significant contribution of 99.238% to the observed variability ( P  = 8.032e−13). However, in contrast, temperature exhibits a non-significant effect, contributing only 0.060% to the variance ( P  = 0.092). Despite its lack of statistical significance, the interaction effect between temperature and magnesium concentration is noteworthy, with a statistically significant contribution of 0.572% ( P  = 3.484e−04). Therefore, it could be concluded that at high salinities (in the context of the experiments), the presence of Mg 2+ divalent ions at low concentrations can contribute to the long-term stability of the hybrid solution. In addition, it is noteworthy that in the presence of Mg 2+ divalent ions, the effect of the temperature (in the temperature range of the study) is negligible.

Figure  14 depicts the main effects plots, revealing the impact of Mg 2+ concentration and temperature on AUC. In high salinity conditions, where temperature shows no discernible effects, the presence of Mg 2+ at low concentrations is crucial for significantly enhancing long-term stability. As can be seen, a high concentration of Mg 2+ resulted in almost 80% lower long-term stability.

figure 14

Main effects plot (fitted means) of AUC for Mg 2+ concentration and temperature.

The behavior of long-term stability of HS-MgTI-GOeP and HS-MgEn-GOeP is illustrated in Fig.  15 . Adding Mg 2+ at low concentrations (HS-MgTI-GOeP) to makeup water results in higher initial and long-term stability than HS-MgEn-GOeP at 60 and 80 °C. Mg 2+ at low concentrations has salting-in effect, due to higher charge density of Mg 2+ against Na + ions. The phenomenon leads to higher dispersion of GO sheets in polymer solution due to stronger electrostatic interactions between Mg 2+ ions and negatively charged functional groups on the GO sheets and polymer chains. On the other hand, at high Mg 2+ concentrations, in addition to increased interactions between Mg 2+ ions and functional groups of GO and HPAM, the EDL thickness of the GOeP hybrid is reduced.

figure 15

ZP results for HS-MgEn-GOeP and HS-MgTI-GOeP at 60 and 80 °C.

Consequently, aggregation/agglomeration is promoted. The EDL is a region surrounding a charged particle, like a GO sheet, where ions of opposite charge accumulate. These counterions help stabilize the particle by repelling others with the same charge. At high Mg 2+ concentrations, the abundance of these cations leads to excessive charge screening, weakening the electrostatic repulsion between individual GO sheets and polymer chains within the GOeP hybrid. Consequently, the particles are more prone to aggregation and agglomeration, reducing stability.

Therefore, the concentration of Mg 2+ plays a critical role. Low concentrations promote dispersion through salting-in, while high concentrations disrupt stability due to excessive EDL screening. The ideal range likely depends on the specific functional groups present in the polymer and the surface chemistry of the GO sheets.

Furthermore, as can be seen, the temperature has no significant effect on the long-term stability of the HS-MgTI-GOeP hybrid because, at low temperatures, the salt-in effect of Mg 2+ overcomes the salt-out effect of Na + , whereas at high temperatures, both the salt-in effect of Mg 2+ and the increased kinetic energy due to higher temperature overcome the salt-out effect of Na + .

Conclusions

In the current study, GO nanosheets were successfully synthesized and characterized, and different methods were explored for preparing stable sGOeP hybrids. Factors such as hydrolyzation time, hydrolyzation domain, and dispersion quality were found to influence the stability of the hybrids. Besides, it was observed that salinity, temperature, and the presence of divalent ions have different effects on hybrid stability. Additionally, it was noted that the hybrid solutions were stable at early times, but the stability trend differed as time passed. Additionally, the AUC factor was introduced to assess the long-term stability of the hybrids, and ANOVA statistical analysis was employed to distinguish between effective and ineffective parameters and the impact of each one. Overall, the findings could provide valuable insights into optimizing the stability of GO-enhanced polymer hybrids for potential applications in oil recovery methods. Therefore, the main conclusions drawn from this investigation are as follows:

GO nanosheets were synthesized and characterized using various standard methods. The results of Raman and XRD tests confirmed that the GO was synthesized successfully, and the size of the GO nanosheets was measured using DLS analysis, which showed a size of 286.9 nm. At room temperature, the pure GO was stable in the aqueous phase, as indicated by the ZP analysis.

The classic methods used to prepare sGOeP hybrids were unsuitable in this case, and different methods suggested in the literature were explored. The stability of the resulting sGOeP hybrids was analyzed using common stability methods, revealing that the hydrolyzation time of the polymer, the hydrolyzation domain, and the additive phase were the main factors affecting hybrid stability.

It was discovered that the dispersion quality had a significant impact on the stability of the final hybrid, with method five, in which the GO solution was previously diluted and dispersed in DIW, achieving the desired stability. These findings had significant implications for preparing GOeP, providing valuable insights into the factors that affect their stability. By optimizing these factors, sGOeP hybrids could potentially be used to enhance oil recovery, leading to more efficient and effective EOR methods.

It was observed that adding polymer to the GO solution and making GOeP hybrid notably increased the long-term stability of the solution.

The results showed that temperature (at the temperature range of the study) had a positive and significant effect on the initial and long-term stability of GOeP solution as the higher temperature (80 °C) promoted the rate of interaction and bonding between polymer and GO molecules.

Overall, salinity led to loss and reduction of stability. Also, at high salinities, it was observed that the salt-out effect was dominant, and the possible positive effect of temperature had no meaningful effect on stability. At low salinities, the stability of the hybrid was promoted mainly because of a lesser salt-out effect. As well, at higher temperatures, long-term stability was promoted as the positive effect of temperature overcame the salt-out effect.

It was observed that at high salinity, the parameter that controlled the long-term stability of the hybrid system was the concentration of Mg 2+ as a divalent ion.

Results illustrated that at high salinity, the presence of Mg 2+ at high concentrations highly affected the stability of the hybrid solution and reduced long-term stability significantly.

The observations also showed that at high salinity, the presence of Mg 2+ (as a divalent ion) at low concentrations substantially promoted the stability of the hybrid as a result of the salt-in effect.

Data availability

The datasets used and/or analyses during the current study is available from the corresponding author on reasonable request.

Abbreviations

Analysis of variance

Area under the curve

Deionized water

Dynamic light scattering

Electric double layer

Enhanced oil recovery

Fourier transform infrared spectroscopy

Graphene oxide enhanced polymer hybrid

Graphene oxide solution

Hydrolyzed polyacrylamide

High salinity brine

The ratio of G and D bands in Raman spectroscopy

The ratio of D and G bands in Raman spectroscopy

Ionic strength

In-plane size of graphene in Raman Spectroscopy

Low salinity brine

Magnesium chloride-enriched brine

Magnesium traced impurity-containing

Nanoparticle-enhanced polymer

Sodium montmorillonite clay nanoparticles

Saline graphene oxide-enhanced polymer

Sikiru, S. et al. Graphene: Outlook in the enhance oil recovery (EOR). J. Mol. Liq. 321 , 114519. https://doi.org/10.1016/j.molliq.2020.114519 (2020).

Article   CAS   Google Scholar  

Azni, M. A., Md Khalid, R., Hasran, U. A. & Kamarudin, S. K. Review of the effects of fossil fuels and the need for a hydrogen fuel cell policy in Malaysia. Sustainability 15 , 4033 (2023).

Bahraminejad, H., Manshad, A. K. & Keshavarz, A. Characterization, micellization behavior, and performance of a novel surfactant derived from Gundelia tournefortii plant during chemical enhanced oil recovery. Energy Fuels 35 , 1259–1272 (2021).

Pal, N., Verma, A., Ojha, K. & Mandal, A. Nanoparticle-modified gemini surfactant foams as efficient displacing fluids for enhanced oil recovery. J. Mol. Liq. 310 , 113193 (2020).

Pope, G. A. Recent developments and remaining challenges of enhanced oil recovery. J. Pet. Technol. 63 , 65–68 (2011).

Article   Google Scholar  

Kamal, M. S., Sultan, A. S., Al-Mubaiyedh, U. A. & Hussein, I. A. Review on polymer flooding: Rheology, adsorption, stability, and field applications of various polymer systems. Polym. Rev. 55 , 491–530 (2015).

Sheng, J. J., Leonhardt, B. & Azri, N. Status of polymer-flooding technology. J. Can. Pet. Technol. 54 , 116 (2015).

Raffa, P., Broekhuis, A. A. & Picchioni, F. Polymeric surfactants for enhanced oil recovery: A review. J. Pet. Sci. Eng. 145 , 723–733 (2016).

Iravani, M., Khalilnezhad, Z. & Khalilnezhad, A. A review on application of nanoparticles for EOR purposes: History and current challenges. J. Pet. Explor. Prod. Technol. 13 , 959–994 (2023).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Khalilnezhad, A., Sahraei, E., Cortes, F. B. & Riazi, M. Encapsulation of xanthan gum for controlled release at water producer zones of oil reservoirs. Pet. Sci. Technol. https://doi.org/10.1080/10916466.2023.2292780 (2023).

Aluhwal, H. & Kalifa, O. Simulation study of improving oil recovery by polymer flooding in a Malaysian reservoir. Dep. Pet. Eng. Univ. Teknol. Malaysia 212 (2008).

Musa, M. S. M., Agi, A., Nwaichi, P. I., Ridzuan, N. & Mahat, S. Q. A. B. Simulation study of polymer flooding performance: Effect of salinity, polymer concentration in the Malay Basin. Geoenergy Sci. Eng. 228 , 211986 (2023).

Navaie, F., Esmaeilnezhad, E. & Choi, H.-J. Effect of rheological properties of polymer solution on polymer flooding characteristics. Polymers 14 , 5555 (2022).

Jouenne, S. Polymer flooding in high temperature, high salinity conditions: Selection of polymer type and polymer chemistry, thermal stability. J. Pet. Sci. Eng. 195 , 107545 (2020).

Chauveteau, G. & Zaitoun, A. Basic rheological behavior of xanthan polysaccharide solutions in porous media: effects of pore size and polymer concentration. In Proceedings of the first European symposium on enhanced oil recovery, Bournemouth, England, Society of Petroleum Engineers, Richardson, TX 197–212 (1981).

Zhang, F. et al. Laboratory experimental study on polymer flooding in high-temperature and high-salinity heavy oil reservoir. Appl. Sci. 12 , 11827 (2022).

MathSciNet   Google Scholar  

Rashidi, M., Blokhus, A. M. & Skauge, A. Viscosity and retention of sulfonated polyacrylamide polymers at high temperature. J. Appl. Polym. Sci. 119 , 3623–3629 (2011).

Aliabadian, E. et al. Application of graphene oxide nanosheets and HPAM aqueous dispersion for improving heavy oil recovery: Effect of localized functionalization. Fuel 265 , 116918 (2020).

Maurya, N. K. & Mandal, A. Studies on behavior of suspension of silica nanoparticle in aqueous polyacrylamide solution for application in enhanced oil recovery. Pet. Sci. Technol. 34 , 429–436 (2016).

Davoodi, S., Al-Shargabi, M., Wood, D. A., Rukavishnikov, V. S. & Minaev, K. M. Experimental and field applications of nanotechnology for enhanced oil recovery purposes: A review. Fuel 324 , 124669 (2022).

Pal, N., Kumar, N. & Mandal, A. Stabilization of dispersed oil droplets in nanoemulsions by synergistic effects of the gemini surfactant, PHPA polymer, and silica nanoparticle. Langmuir 35 , 2655–2667 (2019).

Article   CAS   PubMed   Google Scholar  

Giraldo, L. J. et al. The effects of SiO2 nanoparticles on the thermal stability and rheological behavior of hydrolyzed polyacrylamide based polymeric solutions. J. Pet. Sci. Eng. 159 , 841–852 (2017).

Haruna, M. A. et al. Nanoparticle modified polyacrylamide for enhanced oil recovery at harsh conditions. Fuel 268 , 117186 (2020).

Al-anssari, S. et al. Synergistic effect of nanoparticles and polymers on the rheological properties of injection fluids: Implications for enhanced oil recovery. Energy Fuels https://doi.org/10.1021/acs.energyfuels.1c00105 (2021).

Liao, K. et al. Effects of surfactants on dispersibility of graphene oxide dispersion and their potential application for enhanced oil recovery. J. Pet. Sci. Eng. 213 , 110372 (2022).

Joshi, D., Maurya, N. K. & Mandal, A. Experimental studies on effectiveness of graphene oxide nanosheets dispersion in water/aqueous PHPA for enhanced oil recovery. J. Mol. Liq. 387 , 122728 (2023).

Nourafkan, E., Haruna, M. A., Gardy, J. & Wen, D. Improved rheological properties and stability of multiwalled carbon nanotubes/polymer in harsh environment. J. Appl. Polym. Sci. 136 , 47205 (2019).

Zhang, L., Tao, T. & Li, C. Formation of polymer/carbon nanotubes nano-hybrid shish–kebab via non-isothermal crystallization. Polymer (Guildf). 50 , 3835–3840 (2009).

Feng, L., Xie, N. & Zhong, J. Carbon nanofibers and their composites: A review of synthesizing, properties and applications. Materials (Basel) 7 , 3919–3945 (2014).

Article   ADS   CAS   PubMed   Google Scholar  

Cheraghian, G. et al. Effect of nanoclay on improved rheology properties of polyacrylamide solutions used in enhanced oil recovery. J. Pet. Explor. Prod. Technol. 5 , 189–196 (2015).

Bahraminejad, H., Manshad, A. K., Iglauer, S. & Keshavarz, A. NEOR mechanisms and performance analysis in carbonate/sandstone rock coated microfluidic systems. Fuel 309 , 122327 (2022).

Khalilnezhad, A. et al. Improved oil recovery in carbonate cores using alumina nanoparticles. Energy Fuels 37 , 11765–11775 (2023).

Rezvani, H., Khalilnejad, A. & Sadeghi-Bagherabadi, A. A. Comparative experimental study of various metal oxide nanoparticles for the wettability alteration of carbonate rocks in EOR processes. In 80th EAGE Conference and Exhibition 2018. 2018 , 1–5 (European Association of Geoscientists & Engineers, 2018).

Khalilnejad, A., Lashkari, R., Iravani, M. & Ahmadi, O. Application of synthesized silver nanofluid for reduction of oil-water interfacial tension. In Saint Petersburg 2020 2020 , 1–5 (European Association of Geoscientists & Engineers, 2020).

Fu, L., Liao, K., Tang, B., Jiang, L. & Huang, W. Applications of graphene and its derivatives in the upstream oil and gas industry: A systematic review. Nanomaterials 10 , 1013 (2020).

Fu, X. et al. Graphene oxide as a promising nanofiller for polymer composite. Surfaces Interfaces 37 , 102747 (2023).

Dideikin, A. T. & Vul’, A. Y. Graphene oxide and derivatives: The place in graphene family. Front. Phys. 6 , 149 (2019).

Priyadarsini, S., Mohanty, S., Mukherjee, S., Basu, S. & Mishra, M. Graphene and graphene oxide as nanomaterials for medicine and biology application. J. Nanostructure Chem. 8 , 123–137 (2018).

Lei, Z. et al. Tribological properties of graphene as effective lubricant additive in oil on textured bronze surface. Chin. J. Mater. Res. 30 , 57–62 (2016).

ADS   Google Scholar  

Aftab, A., Ismail, A. R. & Ibupoto, Z. H. Enhancing the rheological properties and shale inhibition behavior of water-based mud using nanosilica, multi-walled carbon nanotube, and graphene nanoplatelet. Egypt. J. Pet. 26 , 291–299 (2017).

Wang, Q. et al. Rheological behavior of fresh cement pastes with a graphene oxide additive. New Carbon Mater. 31 , 574–584 (2016).

Article   ADS   Google Scholar  

Liu, Y. et al. Experimental study of an amphiphilic graphene oxide based nanofluid for chemical enhanced oil recovery of heavy oil. New J. Chem. 47 , 1945–1953 (2023).

Article   ADS   CAS   Google Scholar  

Li, Q. et al. Evaluation of carbon-based 2D nanosheet for enhanced oil recovery application. Pet. Sci. Technol. 40 , 3005–3019 (2022).

Sun, Y., Zhang, W., Li, J., Han, R. & Lu, C. Mechanism and performance analysis of nanoparticle-polymer fluid for enhanced oil recovery: A review. Molecules 28 , 4331 (2023).

Nguyen, T.-L. et al. Stable dispersion of graphene oxide–copolymer nanocomposite for enhanced oil recovery application in high-temperature offshore reservoirs. Colloids Surfaces Physicochem. Eng. Asp. 628 , 127343 (2021).

Jafarbeigi, E., Salimi, F., Kamari, E. & Mansouri, M. Effects of modified graphene oxide (GO) nanofluid on wettability and IFT changes: Experimental study for EOR applications. Pet. Sci. 19 , 1779–1792 (2022).

Garmroudi, A., Kheirollahi, M., Mousavi, S. A., Fattahi, M. & Mahvelati, E. H. Effects of graphene oxide/TiO2 nanocomposite, graphene oxide nanosheets and Cedr extraction solution on IFT reduction and ultimate oil recovery from a carbonate rock. Petroleum 8 , 476–482 (2022).

Keshavan, M. K., Zhang, Y., Shen, Y., Griffo, A. & Janssen, M. Polycrystalline diamond materials having improved abrasion resistance, thermal stability and impact resistance. US patent (2012).

Ma, J. et al. Fiber-optic Fabry-Pérot acoustic sensor with multilayer graphene diaphragm. IEEE Photonics Technol. Lett. 25 , 932–935 (2013).

Wu, L., Zhang, J. & Watanabe, W. Physical and chemical stability of drug nanoparticles. Adv. Drug Deliv. Rev. 63 , 456–469 (2011).

Sheng, J. J. Modern Chemical Enhanced Oil Recovery: Theory and Practice (Gulf Professional Publishing, 2010).

Google Scholar  

Hu, Z., Haruna, M., Gao, H., Nourafkan, E. & Wen, D. Rheological properties of partially hydrolyzed polyacrylamide seeded by nanoparticles. Ind. Eng. Chem. Res. 56 , 3456–3463 (2017).

Elhaei, R., Kharrat, R. & Madani, M. Stability, flocculation, and rheological behavior of silica suspension-augmented polyacrylamide and the possibility to improve polymer flooding functionality. J. Mol. Liq. 322 , 114572 (2021).

Santamaria, O. et al. Phenomenological study of the micro- and macroscopic mechanisms during polymer flooding with SiO2 nanoparticles. J. Pet. Sci. Eng. 198 , 108135 (2021).

Rezaei, A., Abdi, M., Mohebbi, A., Tatar, A. & Mohammadi, A. H. Using surface modified clay nanoparticles to improve rheological behavior of hydrolized polyacrylamid (HPAM) solution for enhanced oil recovery with polymer flooding. J. Mol. Liq. https://doi.org/10.1016/j.molliq.2016.08.004 (2016).

Kumar, S., Tiwari, R., Husein, M., Kumar, N. & Yadav, U. Enhancing the performance of HPAM polymer flooding using nano CuO/nanoclay blend. Processes 8 , 907 (2020).

Haruna, M. A., Pervaiz, S., Hu, Z., Nourafkan, E. & Wen, D. Improved rheology and high-temperature stability of hydrolyzed polyacrylamide using graphene oxide nanosheet. J. Appl. Polym. Sci. https://doi.org/10.1002/app.47582 (2019).

Lashari, N. & Ganat, T. Synthesized graphene oxide and fumed aerosil 380 dispersion stability and characterization with partially hydrolyzed polyacrylamide. Chin. J. Chem. Eng. https://doi.org/10.1016/j.cjche.2020.09.035 (2020).

Kumar, D. et al. Experimental investigation of GO-HPAM and SiO2-HPAM composite for cEOR: Rheology, interfacial tension reduction, and wettability alteration. Colloids Surfaces Physicochem. Eng. Asp. 637 , 128189 (2022).

de Vasconcelos, C. K. B. et al. Nanofluids based on hydrolyzed polyacrylamide and aminated graphene oxide for enhanced oil recovery in different reservoir conditions. Fuel 310 , 122299 (2022).

Hunter, R. J. Foundations of Colloid Science (Oxford University Press, 2001).

Israelachvili, J. N. Intermolecular and Surface Forces (Elsevier Science, 2010).

Derjaguin, B. V. Theory of the stability of strongly charged lyophobic sol and of the adhesion of strongly charged particles in solutions of electrolytes. Acta Phys. Chim. URSS 14 , 633 (1941).

Verwey, E. J. W. Theory of the stability of lyophobic colloids. J. Phys. Chem. 51 , 631–636 (1947).

Rezvani-Moghaddam, A., Ranjbar, Z., Sundararaj, U., Jannesari, A. & Dashtdar, A. Edge and basal functionalized graphene oxide nanosheets: Two different behavior in improving electrical conductivity of epoxy nanocomposite coatings. Prog. Org. Coat. 172 , 107143 (2022).

Haruna, M. A., Pervaiz, S., Hu, Z., Nourafkan, E. & Wen, D. Improved rheology and high-temperature stability of hydrolyzed polyacrylamide using graphene oxide nanosheet. J. Appl. Polym. Sci. 136 , 47582 (2019).

Phuoc, T. X., Massoudi, M. & Chen, R.-H. Viscosity and thermal conductivity of nanofluids containing multi-walled carbon nanotubes stabilized by chitosan. Int. J. Therm. Sci. 50 , 12–18 (2011).

Li, X., Zhu, D. & Wang, X. Evaluation on dispersion behavior of the aqueous copper nano-suspensions. J. Colloid Interface Sci. 310 , 456–463 (2007).

Sun, Y. et al. Properties of nanofluids and their applications in enhanced oil recovery: A comprehensive review. Energy Fuels 34 , 1202–1218 (2020).

Osman, A. et al. Dynamics of salt precipitation on graphene oxide membranes. Cryst. Growth Des. 19 , 498–505 (2019).

Krishnamoorthy, K., Veerapandian, M., Yun, K. & Kim, S.-J. The chemical and structural analysis of graphene oxide with different degrees of oxidation. Carbon N. Y. 53 , 38–49 (2013).

Stankovich, S. et al. Synthesis of graphene-based nanosheets via chemical reduction of exfoliated graphite oxide. Carbon N. Y. 45 , 1558–1565 (2007).

Ferrari, A. C. et al. Raman spectrum of graphene and graphene layers. Phys. Rev. Lett. 97 , 187401 (2006).

Kuila, T. et al. Chemical functionalization of graphene and its applications. Prog. Mater. Sci. 57 , 1061–1105 (2012).

Muzyka, R., Drewniak, S., Pustelny, T., Chrubasik, M. & Gryglewicz, G. Characterization of graphite oxide and reduced graphene oxide obtained from different graphite precursors and oxidized by different methods using Raman spectroscopy. Materials (Basel) 11 , 1050 (2018).

Article   ADS   PubMed   Google Scholar  

Pimenta, M. A. et al. Studying disorder in graphite-based systems by Raman spectroscopy. Phys. Chem. Chem. Phys. 9 , 1276–1290 (2007).

Nethravathi, C. & Rajamathi, M. Chemically modified graphene sheets produced by the solvothermal reduction of colloidal dispersions of graphite oxide. Carbon N. Y. 46 , 1994–1998 (2008).

Kumar, N., Das, S., Bernhard, C. & Varma, G. D. Effect of graphene oxide doping on superconducting properties of bulk MgB2. Supercond. Sci. Technol. 26 , 95008 (2013).

Leaper, S. et al. Flux-enhanced PVDF mixed matrix membranes incorporating APTS-functionalized graphene oxide for membrane distillation. J. Memb. Sci. 554 , 309–323 (2018).

Gbadamosi, A. A. O. & Junin, R. Recent advances and prospects in polymeric nanofluids application for enhanced oil recovery. J. Ind. Eng. Chem. https://doi.org/10.1016/j.jiec.2018.05.020 (2018).

Neklyudov, V. V., Khafizov, N. R., Sedov, I. A. & Dimiev, A. M. New insights into the solubility of graphene oxide in water and alcohols. Phys. Chem. Chem. Phys. 19 , 17000–17008 (2017).

Download references

Author information

Authors and affiliations.

Faculty of Petroleum and Natural Gas Engineering, Sahand University of Technology, Tabriz, Iran

M. Iravani, M. Simjoo & M. Chahardowli

Faculty of Polymer Engineering, Sahand University of Technology, Tabriz, Iran

A. Rezvani Moghaddam

You can also search for this author in PubMed   Google Scholar

Contributions

M. Iravani: conceptualization, analysis, testing, writing M. Simjoo: supervisation, reviewing, conceptualization M. Chahardowli: supervisation, reviewing A. Rezvani Moghaddam: supervisation, reviewing, writing.

Corresponding author

Correspondence to M. Simjoo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Iravani, M., Simjoo, M., Chahardowli, M. et al. Experimental insights into the stability of graphene oxide nanosheet and polymer hybrid coupled by ANOVA statistical analysis. Sci Rep 14 , 18448 (2024). https://doi.org/10.1038/s41598-024-68218-9

Download citation

Received : 04 February 2024

Accepted : 22 July 2024

Published : 08 August 2024

DOI : https://doi.org/10.1038/s41598-024-68218-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Graphene oxide nanosheets
  • Zeta potential
  • Nanoparticle-enhanced polymer hybrid

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

experimental method hybrid

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

Experimental Methods of Validation for Numerical Simulation Results on Steel Flow through Tundish

Profile image of Jacek Pieprzyca

2016, Archives of Metallurgy and Materials

The article presents experimental results on the impact of tundish flow regulator influencing the liquid steel flow course. The research was conducted based on the hybrid modelling methods understood as a complementary use of Computational Fluid Dynamics (CFD) methods and physical modelling. Dynamic development of numerical simulation techniques and accessibility to highly advanced and specialized software causes the fact that these techniques are commonly used for solving problems related to liquid flows by using analytical methods. Whereas, physical modelling is an important cognitive tool in the field of empirical identification of these phenomena. This allows for peer review and specification of the researched problems. By exploiting these relationships, a comparison of the obtained results was performed in the form of residence time distribution (RTD) curves and visualization of particular types of liquid steel flow distribution zones in the investigated tundish.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Open access
  • Published: 08 August 2024

Maptcha: an efficient parallel workflow for hybrid genome scaffolding

  • Oieswarya Bhowmik 1 ,
  • Tazin Rahman 1 &
  • Ananth Kalyanaraman 1  

BMC Bioinformatics volume  25 , Article number:  263 ( 2024 ) Cite this article

1 Altmetric

Metrics details

Genome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate genomic scaffolds in hybrid workflows is complicated due to scale, sequencing technology diversity (e.g., short vs. long reads, contigs or partial assemblies), and repetitive regions within a target genome.

In this paper, we present a new parallel workflow for hybrid genome scaffolding that would allow combining pre-constructed partial assemblies with newly sequenced long reads toward an improved assembly. More specifically, the workflow, called Maptcha , is aimed at generating long scaffolds of a target genome, from two sets of input sequences—an already constructed partial assembly of contigs, and a set of newly sequenced long reads. Our scaffolding approach internally uses an alignment-free mapping step to build a \(\langle \) contig,contig \(\rangle \) graph using long reads as linking information. Subsequently, this graph is used to generate scaffolds. We present and evaluate a graph-theoretic “wiring” heuristic to perform this scaffolding step. To enable efficient workload management in a parallel setting, we use a batching technique that partitions the scaffolding tasks so that the more expensive alignment-based assembly step at the end can be efficiently parallelized. This step also allows the use of any standalone assembler for generating the final scaffolds.

Conclusions

Our experiments with Maptcha on a variety of input genomes, and comparison against two state-of-the-art hybrid scaffolders demonstrate that Maptcha is able to generate longer and more accurate scaffolds substantially faster. In almost all cases, the scaffolds produced by Maptcha are at least an order of magnitude longer (in some cases two orders) than the scaffolds produced by state-of-the-art tools. Maptcha runs significantly faster too, reducing time-to-solution from hours to minutes for most input cases. We also performed a coverage experiment by varying the sequencing coverage depth for long reads, which demonstrated the potential of Maptcha to generate significantly longer scaffolds in low coverage settings ( \(1\times \) – \(10\times \) ).

Peer Review reports

Advancements in sequencing technologies, and in particular, the ongoing evolution from high throughput short read to long read technologies, have revolutionized biological sequence analysis. The first generation of long read technologies such as PacBio SMRT [ 40 ] and Oxford Nanopore Technologies (ONT) [ 16 ] sequencing platforms, were able to break the 10 Kbp barrier for read lengths. However, these technologies also carry a higher cost per base than short read (e.g., Illumina) platforms, and they also have much higher per-base error rate (5–15%) [ 19 , 29 , 32 , 33 , 60 ]. Recent long read technologies such as PacBio HiFi (High Fidelity) [ 24 , 57 , 60 ] have significantly improved accuracy (99.9%).

Genome assembly, irrespective of the sequencing approach employed, strives to accomplish three fundamental objectives. Firstly, it aims to reconstruct an entire target genome in as few pieces or “contigs” (i.e., contiguous sequences) as possible. Secondly, the goal is to ensure the highest accuracy at the base level. Lastly, the process seeks to minimize the utilization of computational resources. Short read assemblers effectively address the second and third objectives [ 10 , 28 , 59 ], while long read assemblers excel in achieving the first goal [ 12 , 31 ].

An important aspect of genome assembly is to to maintain correctness in genome reconstruction [ 3 , 53 ], including composition, continuity, and contiguity. Compositional correctness refers to the correctness of the sequence captured in the output contigs, and is typically measured by the number of misassemblies. Continuity is primarily assessed using metrics such as the N50 value and related measures that show to the extent long stretches of the genome are captured in the contigs correctly, or alternatively how fragmented is an output assembly. In addition to continuity, contiguity across contigs (i.e., the order and orientation of contigs along the unknown genome) is also an important factor, particularly for scaffolding methods.

In the realm of contemporary genome assembly, long read assemblers have adopted the Overlapping-Layout-Consensus (OLC) paradigm [ 12 , 30 , 31 , 48 , 52 , 56 ] and de Bruijn graph approaches [ 39 , 54 , 61 ]. These assemblers utilize advanced algorithms that greatly accelerate the comparison of all-versus-all reads. Many long read assembly tools also perform error correction by representing long reads through condensed and specialized k-mers, such as minimizers [ 47 ] and minhashes [ 52 ]. This refined representation expedites the identification of overlaps exceeding 2 kb. The most recent long read assemblers are now progressing toward reducing computational resources [ 8 , 9 , 41 ]. However, the assembly of uncorrected long reads introduces challenges, necessitating additional efforts in the form of consensus polishing [ 11 , 36 , 58 ]. Genome assembly polishing is a process aimed at enhancing the base accuracy of assembled contig sequences. Typically, long read assemblers undergo a singular round of long read polishing, followed by multiple rounds of polishing involving both long and short reads using third-party tools [ 31 , 35 , 58 ].

The rapid progress in sequencing technologies is providing extensive quantities of raw genomic data. However, the reconstruction of a complete and accurate genome from these fragmented sequences remains a challenge due to inherent complexities, repetitive regions, and limitations of individual sequencing techniques. Genome assembly heavily relies on scaffolding methods to arrange and link these fragments. In other words, as the conventional assembly step focuses on generating contigs that represent contiguous stretches of the target genome, scaffolding focuses on ordering and orienting those contigs, as well as filling the gaps between adjacent contigs using any information that is contained in the raw reads. Relying on a single sequencing technology for scaffolding could still result in incomplete or fragmented assemblies [ 2 ].

This limitation necessitates hybrid scaffolding approaches that are capable of integrating sequences from multiple sources—sequencing technologies and/or prior constructed draft assemblies.

Hybrid scaffolding: The integration of contigs and long read information for scaffolding purposes can be a promising approach to improve existing genome assemblies [ 38 ]. Assemblies generated from short reads are known for their high accuracy, but are often limited by shorter contig lengths, as measured by N50 or NG50. On the other hand, long read sequencing technologies can span larger sections of the genome but are often hindered by higher costs which limit their sequencing coverage depths (to typically under 20 \(\times \) vs. \(100\times \) for short reads), and higher error rates compared to short read sequencing complicating de novo assembly. Hybrid scaffolding workflows can overcome these limitations by integrating the fragmented assemblies of contigs obtained from short reads and utilizing the long reads to order and orient contigs into longer scaffolds.

In this paper, we visit the hybrid scaffolding problem. Given an input set of contigs ( \(\mathcal {C}\) ) generated from short reads, and a set of long reads ( \(\mathcal {L}\) ), hybrid scaffolders aim to order and orient the contigs in \(\mathcal {C}\) using linking information inferred from the long reads \(\mathcal {L}\) . Such an approach has the advantage of reusing and building on existing assemblies to create improved versions of assemblies incrementally, as more and more long read sequencing data sets are available for a target genome. This workflow can also be easily adapted to scenarios where short reads are available (in place of contigs). In such cases, the short reads can be assembled into contigs prior to the application of our hybrid scaffolder.

Related work: While the treatment of the hybrid scaffolding problem is more recent, there are several tools that incorporate long read information for extending contigs into scaffolds. The concept of genome scaffolding initially emerged in the realm of classical de novo genome assembly, as introduced by Huson et al. [ 27 ]. This pioneering work aimed to arrange and align contigs utilizing paired-end read information alongside inferred distance constraints. Of the two steps in scaffolding, the alignment step is not only computationally expensive, but it can also lead to loss in recall using traditional mapping techniques. On the other hand, the second step of detecting the true linking information between contig pairs can be prone to false merges, impacting precision—particularly for repetitive genomes.

Over subsequent years, a suite of tools emerged within this classical framework, each striving to refine scaffolding methodologies [ 2 , 15 , 18 , 20 , 37 , 43 , 49 , 50 ]. For an exhaustive exploration of these methods, refer to the comprehensive review by Luo et al. [ 38 ]. Most of these tools utilize alignments of long reads to the contigs of a draft assembly to infer joins between the contig sequences. The alignment information is subsequently used to link pairs of contigs that form successive regions of a scaffold. SSPACE-LongRead produces final scaffolds in a single iteration and has shown to be faster than some of the other scaffolders for small eukaryaotic genomes; but it takes very long runtimes on larger genomes. For instance, SSPACE-LongRead takes more than 475 h to assemble Z . mays and for the Human CHR 1, it takes more than a month. OPERA-LG [ 21 ] provides an exact algorithm for large and repeat-rich genomes. It requires significant mate-pair information to constrain the scaffold graph and yield an optimised result. OPERA-LG is not directly designed for the PacBio and ONT data. To construct scaffold edges and link contigs into scaffolds, OPERA-LG needs to simulate and group mate-pair relationship information from long reads.

LRScaf [ 44 ] is one of the most recent long read scaffolding tools which utilizes alignment tools like BLASR [ 6 ] or Minimap2 [ 34 ] to align the long reads against the contigs, and generates alignment information. These alignments form the basis for establishing links between contigs. Subsequently, a scaffold graph is constructed, wherein vertices represent contig ends, and edges signify connections between these ends and associated long reads. This graph also encapsulates information regarding contig orientation and long read identifiers. To mitigate errors and complexities arising from repeated regions and high error rates, LRScaf meticulously refines the scaffold graph. This refinement process involves retaining edges associated with a minimal number of long reads and ensuring the exclusion of edges connecting nodes already present within the graph. The subsequent stage involves navigating linear stretches of the scaffold graph. LRScaf traverses the graph, systematically identifying linear paths until encountering a divergent node, signifying a branching point. At this juncture, the traversal direction is reversed, ensuring exhaustive exploration of unvisited and distinct nodes within the graph. This iterative process continues until all unique nodes are visited, resulting in a complete set of scaffolds from the linear paths within the graph. As can be expected, this rigorous process can be time-consuming, taking hours of compute time even on medium sized genomes (as shown later in the Results).

Another recent tool, ntLink [ 14 ] utilizes mapping information from draft assemblies (i.e., contigs) and long reads for scaffolding. This tool employs a minimizer-based approach to first identify the mapped pairs of long reads and contigs, and then uses the mapping information to bridge contigs. However, in their minimizer selection method, non-unique minimizers are discarded. This is done so that repetitive portions within the contigs do not cause false merges in scaffolds. This scheme however limits the lengths of the scaffolds that could be generated by this method (as will be shown in our comparisons).

Contributions

We present a new scalable algorithmic workflow, Maptcha , for hybrid scaffolding on parallel machines using contigs ( \(\mathcal {C}\) ) and high-fidelity long reads ( \(\mathcal {L}\) ). Figure  1 illustrates the major phases of the Maptcha workflow. Our graph-theoretic approach constructs a contig graph from the mapping information between long reads and contigs, then uses this graph to generate scaffolds. The key ideas of the approach include: a) a sketching-based, alignment-free mapping step to build and refine the graph; b) a vertex-centric heuristic called wiring to generate ordered walks of contigs as partial scaffolds and c) a final linking step to bridge the partial scaffolds and create the final set of scaffolds.

To enhance scalability, we implemented a parallel batching technique for scaffold generation, enabling any standalone assembler to run in a distributed parallel manner while generating high-quality scaffolds. We use Hifiasm [ 8 ] as the standalone assembler and JEM-mapper [ 45 , 46 ] for the mapping step.

Our experiments show that Maptcha generates longer and more accurate scaffolds than the state-of-the-art hybrid scaffolders LRScaf and ntLink , while substantially reducing time to solution. For example, the scaffolds produced on the test input Human chr 7 , the NGA50 of Maptcha is around \(18\times \) and \(330\times \) larger compared to that of LRScaf and ntLink respectively. Maptcha is also significantly faster, reducing time-to-solution from hours to minutes in most cases. Furthermore, comparing Maptcha with a standalone long read assembler highlights the benefits of integrating contigs with long reads, resulting in longer scaffolds, fewer misassemblies, and faster runtimes. Coverage experiments (done by varying the sequencing coverage depth for long reads) demonstrated the potential of Maptcha to generate considerably longer scaffolds even in low coverage settings (1 \(\times \) to 10 \(\times \) ).

figure 1

A schematic illustration of the major phases of the proposed Maptcha approach

The Maptcha software is available as open source for download and testing at https://github.com/Oieswarya/Maptcha.git .

In this section, we describe in detail all the steps of our Maptcha algorithmic framework for hybrid scaffolding. Let \(\mathcal {C}=\{c_1,c_2,\ldots c_n\}\) denote a set of n input contigs (from prior assemblies). Let \(\mathcal {L}=\{r_1,r_2,\ldots r_m\}\) denote a set of m input long reads. Let | s | denote length of any string s . We use \(N=\Sigma _{i=1}^n |c_i|\) and \(M=\Sigma _{i=1}^m |r_i|\) . Furthermore, for contig c , let \({\bar{c}}\) denote its reverse complement.

Problem statement: Given \(\mathcal {C}\) and \(\mathcal {L}\) , the goal of our hybrid scaffolding problem is to generate a set of scaffolds \(\mathcal {S}\) such that a) each scaffold \(S\in \mathcal {S}\) represents a subset of \(\mathcal {C}\) such that no two subsets intersect (i.e., \(S_i\cap S_j=\emptyset \) ); and b) each scaffold \(S\in \mathcal {S}\) is an ordered sequence of contigs \([c_1,c_2,\ldots ]\) , with each contig participating in either its direct form c or its reverse complemented form \({\bar{c}}\) . Here, each successive pair of contigs in a scaffold is expected to be linked by one or more long reads \(r\in \mathcal {L}\) . Intuitively, there are two objectives: i) maximize recall—i.e., to generate as few scaffolds as possible, and ii) maximize precision—i.e.,the relative ordering and orientation of the contigs within each scaffold matches the true (but unknown) ordering and orientation of those contigs along the target genome.

Algorithm: The design of the Maptcha scaffolding algorithmic framework is broken down into three major phases.

contig expansion : In the first phase, using the contigs as seeds, we aim to extend them on either end using long reads that align with those contigs. This extension step is also designed to detect and connect successive pairs of contigs with direct long read links. This yields the first generation of our partial scaffolds.

longread island construction : Note that not all long reads may have contributed to these partial scaffolds, in particular those long reads which fall in the gap regions of the target genome between successive scaffolds. Therefore, in the next phase, we detect the long reads that do not map to any of the first generation partial scaffolds, and use them to build partial scaffolds corresponding to these long read island regions. This new set of partial scaffolds corresponds to the second generation of partial scaffolds.

link scaffolds with bridges : Finally, in the last phase, we aim to link the first and second generation scaffolds using long reads that serve as bridges between them. This step outputs the final set of scaffolds.

This three phase approach has the following advantages . First, it provides a systematic way to progressively combine the sequence information available from the input contigs (which typically tend to be more accurate albeit fragmented, if generated from short reads) to the input long reads (which may be significantly larger in number), in an incremental fashion. Next, this incremental approach also could reduce the main computational workload within each phase that is required for mapping long reads. More specifically, we choose to align long reads either to the contigs or to the generated partial scaffolds wherever possible, and in the process restrict the more time consuming long read to long read alignments only to the gap regions not covered by any of the contigs or partial scaffolds. In this paper, we use the JEM-mapper , which is a recently developed fast (parallel) and accurate sketch-based alignment-free long read mapping tool suited for hybrid settings [ 43 , 45 ]. Finally, by decoupling the contig ordering and orientation step (which is a graph-theoretic problem) from the scaffold generation step (which is an assembly problem), we are able to efficiently parallelize the scaffold generation step. This is achieved through a batching step that splits the input sequences into separate batches to allow the use of any existing standalone long read assembler to generate the final sequence scaffolds. Our framework is capable of leveraging any off-the-shelf long read mapping tool. In this paper, we use Hifiasm [ 8 ], which is one of the most widely used state-of-the-art long read assembly tool, as our standalone assembler.

In what follows, we describe the details of our algorithm for each of the three major phases of our approach. Figure  2 provides an illustration of all the main steps within each of these three phases.

figure 2

A detailed illustration of the Maptcha pipeline showing the different phases and their steps

Phase: contig expansion The goal of this phase is to enhance contigs by incorporating long reads that have been aligned with them. This process allows for the extension of contigs by connecting multiple ones into a scaffold using the long reads aligned to them, thereby increasing the overall length of the contigs. This is achieved by first mapping the long reads to contigs to detect those long reads that map to contigs, and then use that information to link contigs and extend them into our first generation of partial scaffolds (panel I in Fig.  2 ).

We use the following definition of a partial scaffold in our algorithm: A partial scaffold corresponds to an ordered and oriented sequence of an arbitrary number of contigs \([c_i,c_j,c_k,\ldots ]\) such that every consecutive pair of contigs along the sequence are linked by one or more long reads.

Step: Mapping long reads to contigs: For mapping, we use an alignment-free, distributed memory parallel mapping tool, JEM-mapper because it is both fast and accurate [ 45 , 46 ]. JEM-mapper employs a sketch-based alignment-free approach that computes a minimizer-based Jaccard estimator ( JEM ) sketch between a subject sequence and a query sequence. More specifically, in a preprocessing step, the algorithm generates a list of minimizing k -mers [ 47 , 51 ] from each subject (i.e., each contig) and then from that list computes minhash sketches [ 4 ] over T random trials (we use \(T=30\) for our experiments). Subsequently, JEM sketches are generated from query long reads. Based on these sketches, for each query the tool reports the subject to whom it is most similar. For further details on the methodology, refer to the original paper by Rahman et al. [ 45 ].

One challenge of using a mapping tool is that the subject (contigs) and query (long reads) sequences may be of variable lengths, thereby resulting possibly in vastly different sized ground sets of minimizers from which to draw the sketches. However, it is the minimizers from the aligning region between the subject and query that should be ideally considered for mapping purposes. To circumvent this challenge, in our implementation we generate sketches only from the two ends of a long read. In other words, our mapping step maps each long read to at most two contigs, one corresponding to each end of that long read. Note that this implies a contig may potentially appear in the mapped set for multiple long reads (depending on the sequencing coverage depth). In our implementation, we used a length of \(\ell \) base pairs ( \(\ell =2Kbp\) used in our experiments) from either end of a long read for this purpose. The intuitive rationale is that since we are interested in a scaffolding application, this approach of involving the ends of long reads (and their respective alignment with contigs) provides a way to link two distantly located contigs (along the genome) through long read bridges.

Using this approach in our preliminary experiments, we compared JEM-mapper with Minimap2 and found that JEM-mapper yielded better quality results for our test inputs (results summarized in the supplementary section Figure S2 ).

Step: Graph construction: Let \(\mathcal {M}\) denote the mapping output, which can be expressed as the set of 2-tuples of the form \(\langle c,r\rangle \) —where long read r maps to a contig c —output by the mapper. We use \(L_c\subseteq \mathcal {L}\) to denote the set of all long reads that map to contig c , i.e., \(L_c=\{r \;|\; \langle c,r\rangle \in \mathcal {M}\}\) . Informally, we refer to \(L_c\) as the long read set corresponding to contig c .

Using the information in \(\mathcal {M}\) , and in \(L_c\) for all \(c\in \mathcal {C}\) , we construct an undirected graph G ( V ,  E ), where:

V is the vertex set such that there is one vertex for every contig \(c\in \mathcal {C}\) ; and

E is the set of all edges of the form \((c_i,c_j)\) , such that there exists at least one long read r that maps to both contigs \(c_i\) and \(c_j\) (i.e., \(L_{c_i}\cap L_{c_j}\ne \emptyset \) ).

Intuitively, each edge is the result of two contigs sharing one or more long reads in their mapping sets. In our implementation, we store the set of long read IDs corresponding to each edge. More specifically, along an edge \((c_i,c_j)\in E\) , we also store its long read set \(L_{i,j}\) given by the set \(L_{c_i}\cap L_{c_j}\) . The cardinality of set \(L_{i,j}\) is referred to as the “support value” for the edge between these two contigs. Since the vertices of G correspond to contigs, we refer to G as a contig graph .

Next, the graph G along with all of its auxiliary edge information as described above, are used to generate partial scaffolds. We perform this in two steps: a) first enumerate paths in the contig graph that are likely to correspond to different partial scaffolds (this is achieved by our wiring algorithm that is described next); and b) subsequently, generate contiguous assembled sequences for each partial scaffold by traversing the paths from the previous step (this is achieved by using a batch assembly step described subsequently).

Step: Wiring heuristic: Recall that our goal is to enumerate partial scaffolds, where each partial scaffold is a maximal sequence of contiguously placed (non-overlapping) contigs along the target genome. In order to enumerate this set of partial scaffolds, we make the following observation about paths generated from the contig graph G ( V ,  E ). A partial scaffold \([c_i, c_{i+1}, \ldots , c_{j}]\) can be expected to be represented in the form of a path in G ( V ,  E ). However, it is important to note that not all graph paths may correspond to a partial scaffold. For instance, consider a branching scenario where a path has to go through a branching node where there are more than one viable path out of that node (contig). If a wrong decision is taken to form paths out of branching nodes, the resulting paths could end up having chimeric merges (where contigs from unrelated parts of the genome are collapsed into one scaffold). While there is no way to check during assembly for such correctness, we present a technique we call wiring , as described below, to compute partial scaffolds that reduce the chance of false merges.

The wiring algorithm’s objective is one of enumerating maximal acyclic paths in G —i.e., maximality to ensure longest possible extension of the output scaffolds, and acyclic to reduce the chance of generating chimeric errors due to repetitive regions in the genome (as illustrated in Fig.  5 ). This problem is trivial if each vertex in V has at most two neighbors in G , as it becomes akin to a linked list of contigs, each with one predecessor contig and one successor contig. However, in practice, we expect several branching vertices that have a degree of more than two (indicating potential presence of repeats). Therefore, finding a successor and/or a predecessor vertex becomes one of a non-trivial path enumeration problem that carefully resolves around branching nodes.

Algorithm: Our wiring algorithm is a linear time algorithm that first computes a “wiring” internal to each vertex, between edges incident on each vertex, and then uses that wired information to generate paths. First, we describe the wiring heuristic.

Step 1: Wiring of vertices: For each vertex \(c\in V\) that has at least degree two, the algorithm selects a subset of two edges incident on that vertex to be “wired”, i.e., to be connected to form a path through that vertex, as shown in Fig.  3 . The two edges so wired determine the vertices adjacent on either side of the current vertex c .

To determine which pair of edges to connect, we use the following heuristic. Let \(L_i\) denote the set of long read IDs associated with edge \(e_i\) . We then (hard) wire two distinct edges \(e_i\) and \(e_j\) incident on a vertex c , if \(L_i\cap L_j\ne \phi \) and it is maximized over all possible pairs of edges incident on c , i.e., \(\arg \max _{e_i,e_j\in \mathcal {E}(c)} |L_i\cap L_j|\) , where \(\mathcal {E}(c)\) denotes all edges incident on c .

The simple intuition is to look for a pair of edges that allows maximum long read-based connectivity in the path flowing through that vertex (contig). This path has the largest support by the long read set and is therefore most likely to stay true to the connectivity between contigs along the target genome. All other possible paths through that vertex are ignored. The resulting wired pair of edges \(\langle e_i,e_j\rangle \) generated from each vertex c is added in the form of wired edge 3-tuple \(\langle c_i, c_j, c \rangle \) . We denote the resulting set as \(\mathcal {W}\) .

There are two special cases to consider here. First, if no pair of edges incident on a vertex c have long reads in common (i.e., \(L_i\cap L_j=\phi \) for all pairs of edges incident), then there is no evidence of a link between any pair of edges on that contig. Therefore, our algorithm would not wire any pair of edges for that contig. In other words, if a walk (step 2) should reach this vertex (contig), such a walk would terminate at this contig.

As another special case, if a vertex c has degree one, then the wiring task is trivial as there exists only one choice to extend a path out of that contig, \(c_e\) , along the edge e attached to that vertex. We treat this as a special case of wiring by introducing a dummy contig \(c_{d}\) to each such vertex with degree one, and adding the tuple \(\langle c_d, c_e, c \rangle \) to \(\mathcal {W}\) .

Note that by this procedure, each vertex c has at most one entry in \(\mathcal {W}\) . To implement this wiring algorithm, note that all we need is to store the set of long read IDs along each edge. A further advantage of this approach is that this is an independent decision made at each vertex, and therefore this step easily parallelizes into a distributed algorithm that works with a partitioning of the input graph.

figure 3

Illustration of the wiring heuristic, shown centered at a contig vertex \(c_i\) . On either side of \(c_i\) are shown other contigs ( \(c_1\) through \(c_k\) ) that each have at least one long read common with \(c_i\) . These long read sets shared between any contig (say j ) and \(c_i\) are denoted by \(L_{i,j}\) (same as \(L_{j,i}\) ). Out of all possible pairwise connections between the incident edges, the wiring heuristic will select only one edge pair

Step 2: Path enumeration: In the next step, we enumerate edge-disjoint acyclic paths using all the wired information from \(\mathcal {W}\) . The rationale behind the edge-disjoint property is to reduce the chance of genomic duplication in the output scaffolds. The rationale for avoiding cycles in paths is two-fold—both to reduce genomic duplication due to repetitive contigs, as well as to reduce the chance of creating chimeric scaffolds.

The path enumeration algorithm (illustrated through an example in Fig.  4 ) works as follows.

Initialize a visit flag at all vertices and set them to unvisited .

Initialize a work queue Q of all vertices with degree one (e.g., \(c_a\) , \(c_e\) , \(c_f\) , \(c_g\) and \(c_h\) in Fig.  4 ).

For each vertex \(c\in Q\) , if c is still unvisited, dequeue c , start a new path at c (denoted by \(P_c\) ), and grow the path as follows. The edge e incident on c connects c to another vertex, say \(c_1\) . Then \(c_1\) is said to be the successor of c in the path and is appended to \(P_c\) . We now mark the vertex c as visited. Subsequently, the algorithm iteratively extends the path by simply following the wired pairing of edges at each vertex visited along the way—marking each such vertex as visited and stitching together the path—until we arrive at one of the following termination conditions:

Arrive at a vertex which has chosen a different predecessor vertex: See for example path \(P_1\) truncated at \(c_b\) because the wiring at \(c_b\) has chosen a different pair of neighbors other than \(c_a\) based on long read support, i.e., \(\mathcal {W}\) contains \(\langle c_g, c_c, c_b\rangle \) . In this case, we add the vertex \(c_b\) at the end of the current path \(P_1\) and terminate that path.

Arrive at a vertex that is already visited: This again implies that no extension beyond this vertex is possible without causing duplication between paths, and so the case is handled the same way as Case a by adding the visited vertex as the last vertex in the path and the path terminated.

Arrive at a degree one vertex: This implies that the path has reached its end at the corresponding degree one contig and the path is terminated at this contig.

More examples of paths are shown in Fig.  4 .

figure 4

Edge-disjoint acyclic paths generated from walking the contig-contig graph. Also shown below are the likely alignments of the individual paths to the (unknown) target genome \(\mathcal {G}\) . Here, since the contig \(c_b\) appears in two paths, it is likely to be contained in a repetitive region ( X , \(X^\prime \) ) as highlighted

Provable properties of the algorithm

The above wiring and path enumeration algorithms have several key properties.

Edge disjoint paths: No two paths enumerated by the wiring algorithm can intersect in edges.

This is guaranteed by the wiring algorithm (step 1), where each vertex chooses only two of its incident edges to be wired to build a path. More formally, by contradiction let us assume there exists an edge e that is covered by two distinct paths \(P_1\) and \(P_2\) . Then this would imply that both paths have to pass through at least one branching vertex c such that there exist \(\langle e_1, e, c\rangle \in \mathcal {W}\) and \(\langle e_2, e, c\rangle \in \mathcal {W}\) (for some \(e_1\ne e_2\ne e\) all incident on c ). However, by construction of the wiring algorithm (step 1) this is not possible. \(\square \)

Acyclic paths: There can be no cycles in any of the paths enumerated.

This is guaranteed by the path enumeration algorithm described above (step 2). More specifically, the termination conditions represented by the Cases (a) and (b) clip any path before it forms a cycle. By not allowing for cycles, our algorithm prevents including the same contig more than once along a scaffold. This is done so as to prevent chimeric misassemblies of a repetitive contig (for example, repetitive regions X and \(X^\prime \) illustrated in Fig.  4 . \(\square \)

Deterministic routing: The path enumeration algorithm is deterministic and generates the same output set of paths for a given \(\mathcal {W}\) regardless of the order in which paths are generated.

This result follows from the fact that the wiring heuristic at each vertex is itself deterministic as well as by the conditions represented by Cases (a) and (b) to terminate a path in the path enumeration algorithm. More specifically, note that each vertex contributes at most one hard-wired edge pair into \(\mathcal {W}\) and none of the other edge pair combinations incident on that vertex could lead to paths. Given this, consider the example shown in Fig.  4 , of two paths \(P_1\) and \(P_2\) converging onto vertex \(c_b\) . Note that in this example, \(\langle c_g, c_c, c_b\rangle \in \mathcal {W}\) . The question here is if it matters whether we start enumerating \(P_1\) first or \(P_2\) first. The answer is no. In particular, if \(P_1\) is the first to get enumerated, then termination condition Case (a) would apply to terminate the path to end at \(c_b\) . Therefore, when \(P_2\) starts, it will still be able to go through \(c_b\) . On the other hand, if \(P_2\) is the first path to get enumerated, then \(c_b\) will get visited and therefore termination condition Case (b) would apply to terminate \(P_1\) at \(c_b\) again. So either way, the output paths are the same. A more detailed example for this order agnostic behavior is shown in S3. This order agnostic property allows us to parallelize the path enumeration process without having to synchronize among paths. \(\square \)

As a corollary to Prop1 (on edge disjoint paths) and Prop2 (on acyclic paths), we now show an important property about the contigs from repetitive regions of the genome and how the wiring algorithm handles those contigs carefully so as to reduce the chances of generating chimeric scaffolds.

Corollary 1

Let \(c_x\) be a contig that is completely contained within a repetitive region. Then this contig can appear as a non-terminal vertex Footnote 1 in at most one path output by the wiring algorithm.

Consider the illustrative example in Fig.  5 , which shows a contig \(c_x\) that maps to a repeat X and its copy \(X^\prime \) . In particular, if there is a trail of long reads linking the two repeat copies (from \([c_x, c_2, \ldots c_{k}, c_x]\) ), then it could generate a cycle in the graph G . However, based on Prop2, the cycle is broken by the path enumeration algorithm and therefore \(c_x\) is allowed to appear as a non-terminal vertex only in at most one of the paths that goes through it. Even if there is no trail of long reads connecting the two repeat regions, the same result holds because of the edge disjoint property of Prop1. \(\square \)

An important implication of this corollary is that our algorithm is careful in using contigs that fall inside repetitive regions. In other words, if a contig appears as a non-terminal vertex along a path, then its predecessor and successor contigs are those to which this contig exhibits maximum support in terms of its long read based links. While it is not possible to guarantee full correctness, the wiring algorithm uses long read information in order to reduce the chances of repetitive regions causing chimeric scaffolds.

figure 5

A case of repeats ( \(X,X^\prime \) ) causing cycles branching around contigs

figure a

Wiring Heuristic

Step: Parallelized contig batch assembly:

As the next step to wiring and path enumeration, we use the paths enumerated to build the output sequence (partial) scaffolds from this phase. To implement this step in a scalable manner, we make a simple observation that the paths enumerated all represent a disjoint set of partial scaffolds. Therefore, we use a partitioning strategy to partition the set of paths into fixed size batches (each containing s contigs), so that these independent batches can be fed in a parallel way, into a standalone assembler that can use both the contigs and long reads of a batch to build the sequences corresponding to the partial scaffolds. We refer to this parallel distributed approach as contig batch assembly.

The assembly of each batch is performed in parallel using any standalone assembler of choice. We used Hifiasm [ 8 ] for all our experiments. By executing contig-long read pairs in parallel batches, this methodology yields one or more scaffolds per batch, contributing to enhanced scalability in assembly processes. Furthermore, the selective utilization of long reads mapped to specific contig batches significantly reduces memory overhead, mitigating the risk of misassemblies that might arise from using the entire long read set which is evident in the results.

This strategy not only reduces memory utilization but also minimizes the potential for misassembly errors that could occur when unrelated sequences are combined.

Phase: longread island construction

The first phase of contig expansion, only focuses on expanding contigs using long reads that map on either side. This can be thought of a seed-and-extend strategy, where contigs are seeds and extensions happen with the long reads. However, there could be regions of the genome that are not covered by this contig expansion step. Therefore, in this phase, we focus on constructing “longread islands” to cover these gap regions. See Fig.  1 for an ilustration of these long read islands. This is achieved in two steps:

First we detect all long reads that do not map to any of the first generation partial scaffolds (generated from the contig expansion step). More specifically, we give as input to JEM-mapper the set of all unused long reads (i.e., unused in the partial scaffolds) and the set of partial scaffolds output by the contig expansion phase. Any long read that maps to previous partial scaffolds are not considered for this phase. Only those that remain unmapped correspond to long reads that fall in the gap regions between the partial scaffolds.

Next, we use the resulting set of unmapped long reads to build partial scaffolds. This is achieved by inputing the unmapped long reads to Hifiasm . The output of this phase represent the second generation of partial scaffolds, each corresponding to a long read island.

Phase: link scaffolds with bridges

In the last phase, we now link the first and second generations of partial scaffolds using any long reads that have been left unused so far. The objective is to bridge these two generations into longer scaffolds if there is sufficient information in the long reads to link them. Note that from an implementation standpoint this is same as for contig expansion, where the union of first and generation partial scaffolds serve as the “contigs” and the rest of the unused long reads serve as the long read set.

Complexity analysis

Recall that m denotes the number of input long reads in \(\mathcal {L}\) , and n is the number of input contigs in \(\mathcal {C}\) . Let p denote the number of processes used by our parallel program.

Out of the three major phases of Maptcha , the contig expansion phase is the one that works on the entire input sets ( \(\mathcal {L}\) and \(\mathcal {C}\) ). The other two phases work on a reduced subset of long reads (unused by the partial scaffolds of the prior scaffolds) and the set of partial scaffolds (which represents a smaller size compared to \(\mathcal {C}\) ). For this reason, we focus our complexity analysis on the contig expansion phase.

In the contig expansion phase we have the following steps:

Mapping long reads to contigs : JEM-mapper [ 45 ] is an alignment-free distributed memory parallel implementation and hence processes load the long reads and contigs in a distributed manner. The dominant step is sketching the input sequences (long reads or contigs). Given that the number of long reads is expected to be more than the number of contigs (due to sequencing depth), the complexity can be expressed as \(O( \frac{{m \ell _l T}}{{p}} )\) , where \(\ell _l\) is average long read length and \( T \) denotes the number of random trials used within its minhash sketch computation.

Graph construction : Let the list of mapped tuples \(\langle c, r \rangle \) from the previous step contain \(\mathcal {T}\) tuples. These \(\mathcal {T}\) tuples are used to generate the contig graph by first sorting all the tuples by their long read IDs to aggregate all contigs that map to the same ID. This can be achieved using an integer sort that scans the list of tuples linearly and inserts into a lookup table for all long read IDs—providing a runtime of \(O(m+\mathcal {T})\) time. Next, this lookup table is scanned one long read ID at a time, and all contigs in its list are paired with one another to create all the edges corresponding to that long read. The runtime of this step is proportional to the output graph size ( G ( V ,  E )), which contains n vertices (one for each contig), and | E | is the number of edges corresponding to all contig pairs detected. Our implementation performs this graph construction in a multithreaded mode.

Wiring heuristic : For the wiring step, each node detects a pair of edges incident on it that has the maximum intersection in the number of long read IDs. This can be achieved in time proportional to \(O(d^2)\) where d is the average degree of a vertex. The subsequent step of path enumeration traverses each edge at most once. Since both these steps are parallelized, the wiring heuristic can be completed in \(O(\frac{nd^2+|E|}{p})\) time.

Contig batch assembly : The last step is the contig batch assembly, where each of the set of enumerated paths are partitioned into b batches, and each batch is individually assembled (using Hifiasm ). As this step is trivially parallelizable, this step takes \( O\left( \frac{b \times a}{p}\right) \) time, where \( a \) is the average time taken for assembling any batch.

In our results, we show that the contig expansion phase dominates the overall runtime of execution (shown later in Fig.  6 ).

The space complexity of Maptcha is dominated by the size to store the input sequences and the size of the contig graph—i.e., \(O(N+M+n+|E|)\) .

Experimental setup

Test inputs: For all our experiments, we used a set of input genomes (from various families) downloaded from the NCBI GenBank [ 1 ]. These genome data sets are summarized for their key statistics in Table  1 . Using the reference for each genome, we generated a set of contigs and a set of long reads as follows. The set of test input contigs ( \(\mathcal {C}\) ) were generated by first generating and then assembling a set of Illumina short reads using the ART Illumina simulator [ 26 ], with 100 \(\times \) coverage and 100bp read length. The reads generated for our experiments do not have paired-end information. For short read assembly, we used the Minia [ 10 ] assembler. As for the set of test long reads ( \(\mathcal {L}\) ), we used the Sim-it PacBio HiFi simulator [ 17 ], with a 10 \(\times \) coverage and long read median length 10Kbp. Furthermore, note the length divergences in both \(\mathcal {C}\) and \(\mathcal {L}\) .

As a real-world dataset, we used a draft assembly of contigs and a set of real-world long reads available for Hesperophylax magnus ( H. magnus )—a caddisfly genome [ 42 ]. The corresponding data was downloaded from NCBI GenBank, as reported in Olsen et al. [ 42 ]. All GenBank accession IDs are shown in supplementary Table S2 . Since the original reads used in this assembly were not available, we simulated the short reads from this assembly and assembled them into contigs using Minia. For long reads, we used the real HiFi long reads provided by Hotaling et al. [ 25 ]. This HiFi dataset consists of a median read length of 11.4 Kbp with a 22.8 \(\times \) coverage. The long reads were generated using the PacBio Sequel II system with SMRTcell.

Qualitative evaluation: To evaluate the quality of of the scaffold outputs produced by Maptcha , we used Quast [ 23 ] which internally maps the scaffolds against the target reference genome and obtains key qualitative metrics consistent with literature, such as NG50 and NGA50 lengths, largest alignment length, number of misassemblies, and genome fraction (the percentage of genome recovered by the scaffolded assembly) (Table 3 ). For a comparative evaluation against a state-of-the-art hybrid scaffolder, we compared the quality as well as runtime performance of Maptcha against that of LRScaf [ 44 ] and ntLink [ 13 , 14 ].

Qualitative evaluation

Scaffold quality

First, we report on the qualitative evaluation for Maptcha , for its hybrid assembly quality. Table  2 shows the quality by the various assembly metrics alongside the quality values for LRScaf and ntLink —for all the inputs tested. The same inputs were provided into all the tools. Note that the assembly quality for Maptcha shown are for the final output set of scaffolds produced by the framework (i.e, after its link scaffolds with bridges phase).

We observe from Table  2 that Maptcha is able to produce a high quality scaffolded assembly, reaching nearly 99% genome coverage with high NG50, NGA50 and largest alignment lengths, low misassembly rate, and a near-perfect (1.0) duplication ratio, for all the test inputs. These results are substantially better than the output quality produced by the two state-of-the-art tools LRScaf and ntLink . For smaller genomes such as E. coli and P. aeruginosa , both LRScaf and ntLink yield competitive results with Maptcha . However, as the genome sizes increase, the assemblies produced by ntLink and LRScaf become more fragmented. For instance, on T. crassiceps , the NGA50 value for Maptcha is about \(21\times \) and \(5.6\times \) larger compared to that of the value for LRScaf and ntLink respectively. Whereas for Human chr 7 , the NGA50 of Maptcha is around \(18\times \) and \(330\times \) larger compared to that of LRScaf and ntLink respectively.

In terms of misassemblies, all three tools produce misassemblies, however to varying degrees, with Maptcha in general producing the fewest number of misassemblies over nearly all the inputs. Misassembly rates are influenced by multiple factors, including the genomic repeat complexity, baseline contiguity, genome fraction, and duplication ratio. In particular, repetitive sequences can significantly impact assembly accuracy and increase misassemblies [ 5 , 7 , 22 , 55 ]. While the number of misassemblies produced by ntLink and Maptcha are comparable for inputs such as P. aeruginosa and T. crassiceps , as the genome size and complexity increase, there is a notable rise in the number of misassemblies with ntLink . As for duplication ratio as well, Maptcha produces scaffolds which have almost no duplication (i.e., ratio is close to 1) in nearly all inputs, while the other tools show varying degrees of duplication. Maptcha also shows the best performance when it comes to genome fraction, capturing almost 99% or more fraction for all the inputs. In general, these results clearly show that Maptcha is able to outperform both LRScaf and ntLink in all the quality metrics reported.

We further examined the growth of contigs and incremental improvement in assembly quality through the different scaffolding phases of Maptcha . Table  4 shows these results, using NG50 lengths output from these different phases as the basis for this improvement assessment. Supplementary Figure S4 shows the increase across all three phases in log-scale.

As can be seen from Table  4 , the initial set of Minia-assembled contigs for larger genomes have NG50 measurements ranging from 1 to 3 Kbp. After the contig expansion phase of Maptcha , a substantial increase in NG50 is observed, often exceeding 200-fold. For instance, inputs such as C. septempunctata , M. florea , and H. aestivaria show a notable increase in NG50 values from around 2 Kbp for the initial contigs to over 400 Kbp post- contig expansion phase. This substantial increase is attributed to the long reads acting as connectors between the shorter contigs, resulting in longer partial scaffolds.

In the subsequent longread island construction phase, there is a modest increase in NG50. However, the primary contribution of this phase is to provide more comprehensive genome coverage in regions not covered by contigs. This phase ensures that gaps left by contigs are filled, thereby enhancing the overall assembly.

The final phase of linking partial scaffolds with remaining long reads in Maptcha results in a noteworthy surge in NG50, up to 1,000 \(\times \) for larger genomes. This phase, similar to the contig expansion phase, shows the greatest increase in NG50 among all phases. The average length of these partial scaffolds is considerably longer, which contributes to this dramatic improvement.

Performance evaluation

Next, we report on the runtime and memory performance of Maptcha and compare that with LRScaf and ntLink . Table  3 shows these comparative results for all inputs tested. All runs with Maptcha were obtained by running it on the distributed memory cluster using \(p=64\) processes—more specifically on 4 compute nodes, each running 16 processes. For both LRScaf and ntLink , we ran them in their multithreaded mode on 64 threads on a single node of the cluster. Note that in parallel computing, distributed memory systems support larger aggregate memory but at the expense of incurring communication (network) overheads, which do not appear in multithreaded systems running on a single node. However to enable a fair comparison on equivalent number of resources, we tested both on the same number ( p ) of processes, with Maptcha running in distributed memory mode while LRScaf and ntLink running on shared memory. For all runs reported for the performance evaluation, we ran Maptcha with a batch size of 8,192 in the batch assembly step.

The results in Table  3 demonstrate that Maptcha outperforms both LRScaf and ntLink in terms of run-time performance. For instance, on medium-sized inputs such as C. elegans , Maptcha completes nearly \(70\times \) faster than LRScaf , reducing the time to solution from over 2 h ( LRScaf ) to 1.8 min ( Maptcha ), whereas ntLink takes 2.42 min. For larger genomes like N. polychloros and M. florea , Maptcha is still the fastest. Even though ntLink runs in comparable times for some of the inputs, the quality of the scaffolds generated by Maptcha is considerably better than that of ntLink (as shown in Table  2 ). For the five largest inputs (out of the 11 simulated test inputs), we could not obtain performance results for LRScaf as those runs did not complete within the allotted 6-hour limit of the cluster.

Table  3 also shows the memory used by the three tools for all the inputs. For Maptcha , recall that the memory is primarily dictated by the memory needed to produce the batch assemblies (which are partitioned into batches). Due to batching, even though the input genome size is increased, the number of contigs that anchor a batch is kept about the same, ensuring a way to control the memory needed to run large assemblies in a scalable fashion. This is the reason why despite growing input sizes, the peak memory used by Maptcha stays approximately steady (under 20 GB).

For the real-world long read dataset used in case of the caddisfly genome input, H. magnus , the quality of the scaffolds generated by Maptcha surpasses both state-of-the-art tools, as shown in Table  5 . LRScaf was unable to complete its run within 6 h, and thus its results are not included. Maptcha outperforms ntLink by producing scaffolds that are 119 times larger in N50 and 29 times longer in the largest contig metric compared to ntLink . Additionally, Maptcha generated scaffolds with no gaps, whereas ntLink had more than 56k gaps per 100 kbp. Although ntLink finished faster, with a runtime of approximately 52 min compared to Maptcha ’s 61 min, the difference in runtime is marginal when considering the substantial improvement in scaffold quality.

We also compared our scaffolding results with the scaffolds reported in Olsen et al. [ 42 ]. We note that the underlying raw reads used in these two studies were different, as the raw reads used in [ 42 ] were not available in public as of this writing. In their original work, they report an N50 of 768 Kbp for performing a hybrid assembly using their Illumina (49 \(\times \) ) and Nanopore (26 \(\times \) ) data. In comparison, our Maptcha scaffolder produces a scaffold set with an N50 of 10 Mbp. This represents a significant improvement in scaffolding length—showing promise that when applied to real-world data our Maptcha scaffolder is likely to yield longer scaffolds. However further study is needed to validate and compare assembly quality, and also perhaps experimenting with different choices of HiFi long read assemblers.

We also studied the runtime breakdown of Maptcha across its different phases. This breakdown is shown normalized for each input in Fig.  6 a (left), all running on \(p=64\) processes. It can be observed that the contig expansion phase is generally the most time consuming phase, occupying anywhere between 40% to 60% of the runtime, with the other two phases roughly evenly sharing the remainder of the runtime. Figure  6 b (right) further shows how the run-time is distributed within the contig expansion phase. As can be noted, more than 80% of the time is spent in the batch assembly step, while the remainder of the run-time is spent mostly on mapping.

figure 6

( a ) Normalized runtime breakdown for the different rounds of Maptcha pipeline for \(p = 64\) . (b) Normalized runtime breakdown for different steps in the contig expansion round for input H. aestivaria .

figure 7

Effect of batch size on NG50 and average time taken for input H. aestivaria

Effect of batch size on NG50 and run-time: Fig.  7 shows the impact of varying batch sizes on NG50 and processing time, using the H. aestivaria genome as an example. Recall that the batch size is the number of contigs that are used to anchor each batch along with their respective long reads that map to those contigs. Subsequently, each batch is provided to a standalone assembly (using Hifiasm ) to produce the assemblies for the final scaffolds. We experimented with a wide range of batch size, starting from 32, and until 16K. As anticipated, smaller batch sizes exhibit reduced processing times due to the smaller assembly workload per batch. However, if a batch is too small then the resulting assembly quality is highly fragmented (resulting in small NG50 values) as can be observed. Conversely, larger batch sizes necessitate longer processing times (e.g., batch size 32 requiring approximately 280 s, while 8K batch size requires 1,841 s). But the NG50 metric substantially improves—e.g., NG50 size improvement from 93Kbp to 1.8Mbp from a batch size of 32 to 8K.

We found that increasing the batch size from 8K to 16K resulted in a slight increase in NG50 (1.86Mbp to 1.89Mbp), but also a substantial increase in processing time (1,841 s to 2,329 s). Since the increase in NG50 was not significant enough to justify the longer processing time, we decided to use the batch size of 8K for all our tests.

Coverage experiment with Maptcha (hybrid) and Hifiasm (only-LR)

One of the main features of a hybrid scaffolding workflow is that it has the potential to build incrementally on prior constructed assemblies using newly sequenced long reads. This raises two questions: a) how does the quality of a hybrid workflow compare to a standalone long read-only workflow? b) can the information in contigs (or prior constructed assemblies) be used to offset for lower coverage sequencing depth in long reads?

To answer these two questions, we compared the Maptcha scaffolds to an assembly produced directly by running a standalone long read assembler but just using the long reads. For the latter, we used Hifiasm and denote the corresponding runs with the label Hifiasm (only-LR) (to distinguish it from the hybrid configuration in Maptcha ). Analysis was performed using different coverages (1x, 2x, 3x, 4x, 8x, and 10x) for the long read data set, for the H. aestivaria input, and focusing on performance metrics of NG50, execution time, and peak memory utilization.

The results shown in Table  6 for this experiment, revealed that at lower coverages (1x and 2x), Hifiasm (only-LR) and Maptcha demonstrated relatively comparable performance. However, as the long read coverage increased, Maptcha exhibited better NG50 quality over Hifiasm (only-LR) , demonstrating the value of adding the contigs in growing the scaffold length. For instance, at 4x coverage, Maptcha yielded a considerably longer NG50 (ten-fold increase). The assembly quality becomes comparable for higher coverage settings. These results demonstrate that the addition of prior constructed assemblies can increase the scaffold length compared to long read-only assemblies. However, this value in growing the scaffold length tends to diminish for higher coverage settings—showing that the addition of contigs can be used to offset reduced coverage settings.

Table  6 also shows that a run-time and memory advantage of Maptcha over Hifiasm (only-LR) . For instance, Maptcha was generally between two and four times faster than Hifiasm (only-LR) (e.g., on the 10x input, Maptcha takes 30 min compared to 81 min taken by Hifiasm (only-LR) ). Note that internally, Maptcha also is using the standalone version of Hifiasm to compute its final assembly product. These results show that the Maptcha approach of enumerating paths to generate partial scaffolds and distributing those into batches, reduces the overall assembly workload for the final assembly step, without compromising on the quality.

Genome assembly remains a challenging task, particularly in resolving repetitive regions, given its inherently time-intensive nature. In this study, we present Maptcha , a novel hybrid scaffolding pipeline designed to combine previously constructed assemblies with newly sequenced high fidelity long reads. As demonstrated, the Maptcha framework is able to increase the scaffold lengths substantially, with the NG50 lengths growing by more than four orders of magnitude relative to the initial input contigs. This represents a substantial improvement in genomic reconstruction that comes without any compromise in the accuracy of the genome. Furthermore, our method is able to highlight the value added by prior constructed genome assemblies toward potentially reducing the required coverage depth for downstream long read sequencing. In terms of performance, the Maptcha software is a parallel implementation that is able to take advantage of distributed memory machines to reduce time-to-solution of scaffolding. The software is available as open source for testing and application at https://github.com/Oieswarya/Maptcha.git .

Availibility of data and materials

All inputs were downloaded from NCBI GenBank [ 1 ]. All accession numbers are provided in the supplementary table Table S2. Our software is available as open source for testing and application at https://github.com/Oieswarya/Maptcha.git .

A vertex is said to be non-terminal along a path if it appears neither at the start nor the end of that path.

Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2012;41(D1):D36–42.

Article   PubMed   PubMed Central   Google Scholar  

Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using sspace. Bioinformatics. 2011;27(4):578–9.

Article   CAS   PubMed   Google Scholar  

Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):2047-217X.

Article   Google Scholar  

Broder AZ. On the resemblance and containment of documents. In: Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171), pages 1997;21–29. IEEE.

Cechova M. Probably correct: rescuing repeats with short and long reads. Genes. 2020;12(1):48.

Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinform. 2012;13(1):1–18.

Chakravarty S, Logsdon G, Lonardi S. Rambler: de novo genome assembly of complex repetitive regions. bioRxiv, pages 2023;2023–05.

Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–5.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Cheng H, Jarvis ED, Fedrigo O, Koepfli K-P, Urban L, Gemmell NJ, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol. 2022;40(9):1332–5.

Chikhi R, Rizk G. Space-efficient and exact de bruijn graph representation based on a bloom filter. Algorithms Mol Biol. 2013;8(1):1–9.

Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. Nonhybrid, finished microbial genome assemblies from long-read smrt sequencing data. Nat Methods. 2013;10(6):563–9.

Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13(12):1050–4.

Coombe L, Li JX, Lo T, Wong J, Nikolic V, Warren RL, Birol I. Longstitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinform. 2021;22:1–13.

Coombe L, Warren RL, Wong J, Nikolic V, Birol I. ntlink: a toolkit for de novo genome assembly scaffolding and mapping using long reads. Curr Protocols. 2023;3(4): e733.

Article   CAS   Google Scholar  

Dayarian A, Michael TP, Sengupta AM. Sopra: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 2010;11(1):1–21.

Deamer D, Akeson M, Branton D. Three decades of nanopore sequencing. Nat Biotechnol. 2016;34(5):518–24.

Dierckxsens N, Li T, Vermeesch JR, Xie Z. A benchmark of structural variation detection by long reads through a realistic simulated model. Genome Biol. 2021;22(1):1–16.

Donmez N, Brudno M. Scarpa: scaffolding reads with practical algorithms. Bioinformatics. 2013;29(4):428–34.

Fu S, Wang A, Au KF. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol. 2019;20:1–17.

Gao S, Sung W-K, Nagarajan N. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol. 2011;18(11):1681–91.

Gao S, Bertrand D, Chia BK, Nagarajan N. Opera-lg: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biol. 2016;17(1):1–16.

Guo R, Li Y-R, He S, Ou-Yang L, Sun Y, Zhu Z. Replong: de novo repeat identification using long read sequencing data. Bioinformatics. 2018;34(7):1099–107.

Gurevich A, Saveliev V, Vyahhi N, Tesler G. Quast: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–5.

Hon T, Mars K, Young G, Tsai Y-C, Karalius JW, Landolin JM, Maurer N, Kudrna D, Hardigan MA, Steiner CC, et al. Highly accurate long-read hifi sequencing data for five complex genomes. Scientific data. 2020;7(1):1–11.

Hotaling S, Wilcox ER, Heckenhauer J, Stewart RJ, Frandsen PB. Highly accurate long reads are crucial for realizing the potential of biodiversity genomics. BMC Genomics. 2023;24(1):117.

Huang W, Li L, Myers JR, Marth GT. Art: a next-generation sequencing read simulator. Bioinformatics. 2012;28(4):593–4.

Article   PubMed   Google Scholar  

Huson DH, Reinert K, Myers EW. The greedy path-merging algorithm for contig scaffolding. J ACM. 2002;49(5):603–15.

Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, et al. Abyss 2.0 resource-efficient assembly of large genomes using a bloom: filter. Genome Res. 2017;27(5):768–77.

Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018;36(4):338–45.

Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37(5):540–6.

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36.

Korlach J, Biosciences P. Understanding accuracy in smrt sequencing. Pac Biosci. 2013;1–9:2013.

Google Scholar  

Laver T, Harrison J, Oneill P, Moore K, Farbos A, Paszkiewicz K, Studholme DJ. Assessing the performance of the oxford nanopore technologies minion. Biomol Detect Quantif. 2015;3:1–8.

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100.

Lin Y, Yuan J, Kolmogorov M, Shen MW, Chaisson M, Pevzner PA. Assembly of long error-prone reads using de bruijn graphs. Proc Natl Acad Sci. 2016;113(52):E8396–405.

Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015;12(8):733–5.

Luo J, Wang J, Zhang Z, Li M, Wu F-X. Boss: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics. 2017;33(2):169–76.

Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. A comprehensive review of scaffolding methods in genome assembly. Brief Bioinform. 2021;22(5):033.

Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, et al. Soapdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):2047-217X.

Mason CE, Elemento O. Faster sequencers, larger datasets, new challenges. 2012.

Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. Hicanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30(9):1291–305.

Olsen LK, Heckenhauer J, Sproul JS, Dikow RB, Gonzalez VL, Kweskin MP, Taylor AM, Wilson SB, Stewart RJ, Zhou X, et al. Draft genome assemblies and annotations of agrypnia vestita walker, and hesperophylax magnus banks reveal substantial repetitive element expansion in tube case-making caddisflies (insecta: Trichoptera). Genome Biol Evol. 2021;13(3):evab013.

Pop M, Kosack DS, Salzberg SL. Hierarchical scaffolding with bambus. Genome Res. 2004;14(1):149–59.

Qin M, Wu S, Li A, Zhao F, Feng H, Ding L, Ruan J. LRScaf: Improving draft genomes using long noisy reads. BMC Genom. 2019;20(1):1–12.

Rahman T, Bhowmik O, Kalyanaraman A. An efficient parallel sketch-based algorithm for mapping long reads to contigs. In 2023 IEEE International parallel and distributed processing symposium workshops (IPDPSW), pages 157–166. IEEE, 2023a.

Rahman T, Bhowmik O, Kalyanaraman A. An efficient parallel sketch-based algorithmic workflow for mapping long reads. bioRxiv, pages 2023–11, 2023b.

Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Reducing storage requirements for biological sequence comparison. Bioinformatics. 2004;20(18):3363–9.

Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat Methods. 2020;17(2):155–8.

Sahlin K, Vezzi F, Nystedt B, Lundeberg J, Arvestad L. Besst-efficient scaffolding of large fragmented assemblies. BMC Bioinformatics. 2014;15(1):1–11.

Salmela L, Mäkinen V, Välimäki N, Ylinen J, Ukkonen E. Fast scaffolding with small independent mixed integer programs. Bioinformatics. 2011;27(23):3259–65.

Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 76–85, 2003.

Shafin K, Pesout T, Lorig-Roach R, Haukness M, Olsen HE, Bosworth C, Armstrong J, Tigyi K, Maurer N, Koren S, et al. Nanopore sequencing and the shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol. 2020;38(9):1044–53.

Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. Busco: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2.

Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. Abyss: a parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–23.

Tørresen OK, Star B, Jentoft S, Reinar WB, Grove H, Miller JR, Walenz BP, Knight J, Ekholm JM, Peluso P, et al. An improved genome assembly uncovers prolific tandem repeats in atlantic cod. BMC Genomics. 2017;18:1–23.

Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737–46.

Vollger MR, Logsdon GA, Audano PA, Sulovari A, Porubsky D, Peluso P, Wenger AM, Concepcion GT, Kronenberg ZN, Munson KM, et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann Hum Genet. 2020;84(2):125–40.

Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 2014;9(11): e112963.

Weisenfeld NI, Yin S, Sharpe T, Lau B, Hegarty R, Holmes L, Sogoloff B, Tabbaa D, Williams L, Russ C, et al. Comprehensive variation discovery in single human genomes. Nat Genet. 2014;46(12):1350–5.

Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37(10):1155–62.

Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Res. 2008;18(5):821–9.

Download references

Acknowledgements

This research was supported in parts by NSF grants OAC 1910213, CCF 1919122, and CCF 2316160. We thank Dr. Priyanka Ghosh for several discussions during the early stages of the project.

This research was funded in parts by NSF grants OAC 1910213, CCF 1919122, and CCF 2316160.

Author information

Authors and affiliations.

School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA

Oieswarya Bhowmik, Tazin Rahman & Ananth Kalyanaraman

You can also search for this author in PubMed   Google Scholar

Oieswarya Bhowmik was the lead student author who developed the methods, implemented the tool, and conducted experimental evaluation. Tazin Rahman assisted in experimental set up as well as contributed to the incorporation of the mapping step of the approach. Ananth Kalyanaraman conceptualized the project and contributed to algorithm design. All authors contributed to the writing of the manuscript and/or proof reading of the manuscript.

Corresponding author

Correspondence to Oieswarya Bhowmik .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Bhowmik, O., Rahman, T. & Kalyanaraman, A. Maptcha: an efficient parallel workflow for hybrid genome scaffolding. BMC Bioinformatics 25 , 263 (2024). https://doi.org/10.1186/s12859-024-05878-4

Download citation

Received : 01 March 2024

Accepted : 22 July 2024

Published : 08 August 2024

DOI : https://doi.org/10.1186/s12859-024-05878-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Genome assembly
  • Hybrid scaffolding
  • Long read mapping

BMC Bioinformatics

ISSN: 1471-2105

experimental method hybrid

Health & Environmental Research Online (HERO)

  • Learn about HERO
  • Search HERO
  • Projects in HERO
  • Risk Assessment
  • Transparency & Integrity

Advertisement

Advertisement

Hybrid model based on energy and experimental methods for parallel hexapod-robotic light abrasive grinding operations

  • ORIGINAL ARTICLE
  • Published: 29 July 2017
  • Volume 93 , pages 3873–3887, ( 2017 )

Cite this article

experimental method hybrid

  • Masoud Latifinavid 1 , 2 &
  • Erhan ilhan Konukseven 1  

301 Accesses

8 Citations

Explore all metrics

Automatic grinding using robot manipulators requires simultaneous control of the robot endpoint and force interaction between the robot and the constraint surface. In robotic grinding, surface quality can be increased by accurate estimation of grinding forces where significant tool and workpiece deflection occurs. The small diameter of the tool causes different behavior in the grinding process in comparison with the tools that are used by universal grinding machines. In this study, we develop a robotic surface grinding force model to predict the normal and tangential grinding forces. A physical model is used based on chip formation energy and sliding energy. To improve the model for robotic grinding operations, a refining term is added. The stiffness of the tool and setup is inherently included using penetration test results and estimating the refining term of the model. The model coefficients are calculated using a linear regression technique. The proposed model is validated by comparing model outputs with experimentally obtained data. Evaluation of the test results demonstrates the effectiveness of the proposed model in predicting surface grinding forces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

experimental method hybrid

Force model for impact cutting grinding with a flexible robotic tool holder

experimental method hybrid

High-performance parallel hexapod-robotic light abrasive grinding using real-time tool deflection compensation and constant resultant force control

experimental method hybrid

A Robotic Belt Grinding Force Model to Characterize the Grinding Depth with Force Control Technology

Amamou R, Fredj NB, Fnaiech F (2008) Improved method for grinding force prediction based on neural network. Int J Adv Manuf Technol 39(7-8):656–668

Article   Google Scholar  

Arnaiz-González Á, Fernández-Valdivielso A, Bustillo A, López de Lacalle LN (2016) Using artificial neural networks for the prediction of dimensional error on inclined surfaces manufactured by ball-end milling. Int J Adv Manuf Technol 83(5-8):847– 859

Aslan D, Budak E (2014) Semi-analytical force model for grinding operations. Procedia CIRP 14:7–12

Azizi A, Mohamadyari M (2015) Modeling and analysis of grinding forces based on the single grit scratch. Int J Adv Manuf Technol 78(5-8):1223–1231

Azuaje F (2006) Witten ih, frank e: Data mining: Practical machine learning tools and techniques 2nd edition. BioMed Eng OnLine 5(1):1

Brinksmeier E, Aurich J, Govekar E, Heinzel C, Hoffmeister HW, Klocke F, Peters J, Rentsch R, Stephenson D, Uhlmann E et al (2006) Advances in modeling and simulation of grinding processes. CIRP Ann-Manuf Technol 55(2):667–696

Chen W (2000) Cutting forces and surface finish when machining medium hardness steel using cbn tools. Int J Mach Tools Manuf 40(3):455–466

Chen X, Rowe W, Cai R (2002) Precision grinding using cbn wheels. Int J Mach Tools Manuf 42 (5):585–593

Dai H, Yuen K, Elbestawi M (1993) Parametric modelling and control of the robotic grinding process. Int J Adv Manuf Technol 8(3):182–192

Deng Z, Zhang X, Liu W, Cao H (2009) A hybrid model using genetic algorithm and neural network for process parameters optimization in nc camshaft grinding. Int J Adv Manuf Technol 45(9-10):859–866

Díaz-Tena E, Ugalde U, López De Lacalle LN, De la Iglesia A, Calleja A, Campa F (2013) Propagation of assembly errors in multitasking machines by the homogenous matrix method. Int J Adv Manuf Technol 68(1-4):149–164

Feng J, Chen P, Ni J (2013) Prediction of grinding force in microgrinding of ceramic materials by cohesive zone-based finite element method. Int J Adv Manuf Technol 68(5-8):1039–1053

Golberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addion Wesley 1989:102

Google Scholar  

Guo M, Li B, Ding Z, Liang SY (2016) Empirical modeling of dynamic grinding force based on process analysis. The International Journal of Advanced Manufacturing Technology, pp 1–11

Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato

Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of 17th International Conference on Machine Learning, San Francisco, CA, pp 359–366

Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. progress in connectionist-based information systems. Int Conf Neural Inf Process Intell Inf Syst 2:855– -858

Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. In: Proceedings of the 21st Australasian Computer Science Conference ACSC, pp 181–191

Hecker RL, Liang SY, Wu XJ, Xia P, Jin DGW (2007) Grinding force and power modeling based on chip thickness analysis. Int J Adv Manuf Technol 33(5-6):449–459

López de Lacalle LN, Lamikiz A, Sanchez J, Salgado M (2004) Effects of tool deflection in the high-speed milling of inclined surfaces. Int J Adv Manuf Technol 24(9-10):621–631

López de Lacalle L N, Lamikiz A, Sanchez JA, De Bustos IF (2005) Simultaneous measurement of forces and machine tool position for diagnostic of machining tests. IEEE Trans Instrum Meas 54(6):2329–2335

López de Lacalle LN, Lamikiz A, Muñoa J, Salgado M, Sánchez J (2006) Improving the high-speed finishing of forming tools for advanced high-strength steels (ahss). Int J Adv Manuf Technol 29(1):49–63

Leonesio M, Parenti P, Cassinari A, Bianchi G, Monno M (2012) A time-domain surface grinding model for dynamic simulation. Procedia CIRP 4:166–171

Malkin S (2002) Grinding technology. Society of Manufacturing Engineers (SME)

Malkin S, Cook NH (1971) The wear of grinding wheels: part 1—attritious wear. J Eng Indust 93 (4):1120–1128

Song HC, Song JB (2013) Precision robotic deburring based on force control for arbitrarily shaped workpiece using cad model matching. Int J Precis Eng Manuf 14(1):85–91

Article   MathSciNet   Google Scholar  

Tahvilian AM, Hazel B, Rafieian F, Liu Z, Champliaud H (2015) Force model for impact cutting grinding with a flexible robotic tool holder. The International Journal of Advanced Manufacturing Technology, pp 1–15

Tang J, Du J, Chen Y (2009) Modeling and experimental study of grinding forces in surface grinding. J Mater Process Technol 209(6):2847–2854

Tönshoff H, Peters J, Inasaki I, Paul T (1992) Modelling and simulation of grinding processes. CIRP Ann-Manuf Technol 41(2):677–688

Wang D, Ge P, Bi W, Jiang J (2014) Grain trajectory and grain workpiece contact analyses for modeling of grinding force and energy partition. Int J Adv Manuf Technol 70(9-12):2111– 2123

Zhang B, Wang J, Yang F, Zhu Z (1999) The effect of machine stiffness on grinding of silicon nitride. Int J Mach Tools Manuf 39(8):1263–1283

Download references

Acknowledgements

We would like to thank the Scientific and Technological Research Council of Turkey for their financial support of this research under Grant TUBITAK -114E274.

Author information

Authors and affiliations.

Mechanical Engineering Department, Middle East Technical University, Dumlupinar Bulvari No. 1, Cankaya, 06800, Ankara, Turkey

Masoud Latifinavid & Erhan ilhan Konukseven

Mechatronic Engineering Department, University of Turkish Aeronautical Association, Bahcekapi Quarter Okul Street No:11, 06790, Etimesgut, Ankara, Turkey

Masoud Latifinavid

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Erhan ilhan Konukseven .

Rights and permissions

Reprints and permissions

About this article

Latifinavid, M., Konukseven, E. Hybrid model based on energy and experimental methods for parallel hexapod-robotic light abrasive grinding operations. Int J Adv Manuf Technol 93 , 3873–3887 (2017). https://doi.org/10.1007/s00170-017-0798-8

Download citation

Received : 24 November 2016

Accepted : 11 July 2017

Published : 29 July 2017

Issue Date : December 2017

DOI : https://doi.org/10.1007/s00170-017-0798-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Grinding force model
  • Robotic grinding
  • Data mining
  • Find a journal
  • Publish with us
  • Track your research

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: jambatalk: speech-driven 3d talking head generation based on hybrid transformer-mamba model.

Abstract: In recent years, talking head generation has become a focal point for researchers. Considerable effort is being made to refine lip-sync motion, capture expressive facial expressions, generate natural head poses, and achieve high video quality. However, no single model has yet achieved equivalence across all these metrics. This paper aims to animate a 3D face using Jamba, a hybrid Transformers-Mamba model. Mamba, a pioneering Structured State Space Model (SSM) architecture, was designed to address the constraints of the conventional Transformer architecture. Nevertheless, it has several drawbacks. Jamba merges the advantages of both Transformer and Mamba approaches, providing a holistic solution. Based on the foundational Jamba block, we present JambaTalk to enhance motion variety and speed through multimodal integration. Extensive experiments reveal that our method achieves performance comparable or superior to state-of-the-art models.
Comments: 12 pages with 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

experimental method hybrid

Analytical Methods

Improving the analysis of phase-separated bio-fuel samples with slice-selective total correlation nmr spectroscopy.

Separated samples are a particular challenge for NMR experiments. The boundary is severely detrimental to high-resolution spectra and normal NMR experiments simply add the two spectra of the two layers together. Pyrolysis bio-oils represent an increasingly important alternative fuel resource yet readily separate, whether due to naturally high water content or due to blending, a common pracice for producing a more viable fuel. Slice-selective NMR, where the NMR spectrum of only a thin slice of the total sample is acquired, is extended here and improved, with slice-selective two-dimensional correlation experiments used to resolve the distinct chemical spectra of the various components of the phase-separated blended fuel mixtures. Analysis of how the components of any blended biofuel samples partition between the two layers is an important step towards understanding the separation process and may provide insight into mitigating the problem.

Supplementary files

  • Supplementary information PDF (429K)

Article information

experimental method hybrid

Download Citation

Permissions.

experimental method hybrid

J. Singh Khangura, B. Tang, K. Chong and R. Evans, Anal. Methods , 2024, Accepted Manuscript , DOI: 10.1039/D4AY01006J

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence . You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

IMAGES

  1. Flow-chart of hybrid numerical-experimental method.

    experimental method hybrid

  2. Flowchart of the hybrid computational-experimental approach.

    experimental method hybrid

  3. Hybrid Experimental Designs

    experimental method hybrid

  4. Hybrid method structure.

    experimental method hybrid

  5. The hybrid methodology workflow implemented in this work. Experimental

    experimental method hybrid

  6. Flow chart of advanced photoelastic experimental hybrid method

    experimental method hybrid

COMMENTS

  1. Hybrid Experimental Designs for Intervention Development: What, Why

    Here, we introduce the hybrid experimental design (HED)—a new experimental approach that can be used to answer scientific questions about building psychological interventions in which human-delivered and digital components are integrated and adapted at multiple timescales. ... Additional research is needed to develop methods for analyzing ...

  2. Hybrid Experimental Designs

    Hybrid experimental designs are just what the name implies ⁠— new strains that are formed by combining features of more established designs. There are lots of variations that could be constructed from standard design features. Here, I'm going to introduce two hybrid designs.

  3. A review and perspective on hybrid modeling methodologies

    2. Fundamentals of hybrid modeling. In this section, we describe the hybrid modeling structures (Section 2.1, the main advantages of hybrid modeling (Section 2.2), and the training of hybrid models (Section 2.3).. 2.1. Hybrid modeling structures. There are three basic configurations of parametric and nonparametric model that determine whether a hybrid model is understood to have a parallel or ...

  4. Hybrid Methods

    The central role of modern experimental analysis is to help complete, through measurement and testing, the construction of an analytical model for the given problem. This chapter recapitulates recent developments in hybrid methods for achieving this and demonstrates through examples the progress being made.

  5. Integrative/Hybrid Modeling Approaches for Studying Biomolecules

    Integrative/hybrid modeling combines input from multiple biophysical experiments. •. Computation plays a crucial role in implementation of hybrid methods. •. Advancements in cryo-EM and XL-MS have led to growth in hybrid modeling. •. New methods such as XFEL hold promise for further development of hybrid methods.

  6. Hybrid methods for combined experimental and computational

    Therefore, methods to accurately elucidate three-dimensional structures of proteins are in high demand. While there are a few experimental techniques that can routinely provide high-resolution structures, such as x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-EM, which have been developed to determine the structures of ...

  7. Hybrid methods for combined experimental and computational

    Hybrid methods for combined experimental and computational determination of protein structure Justin T. Seffernick; ... There are some experimental methods that can be used to determine the structures of proteins at resolutions where the positions of heavy atoms can be elucidated (<3 Å), namely, x-ray crystallography, nuclear magnetic ...

  8. Hybrid methods for combined experimental and computational

    Representations of each featured experimental method used for computational modeling. In this Perspective, we discuss how each method has been used for computational modeling in the form of de novo folding from the sequence (tertiary structure prediction), protein-protein docking (quaternary structure prediction), and molecular dynamics (physics-based protein dynamics simulation), as shown ...

  9. Elastomeric seal stress analysis using photoelastic experimental hybrid

    In the photoelastic experimental hybrid method, minimizing the errors, D(ε) in Eq. is crucial in order to increase the accuracy of the methodThe starting point is the selection of the region from ...

  10. Hybrid computational methods combining experimental information with

    Hybrid modeling is an umbrella term for many possible combinations of computational methods and sources of experimental data. Clear protocols and standards exist for determining structures using data-rich regime techniques, such as X-ray crystallography, cryo-EM, or solution NMR.

  11. Hybrid Methods in Experimental Mechanics

    Summary. Hybrid procedures can be broadly defined as the synthesis of a variety of available methodologies into a composite techinque, which, taken as a whole, is more useful than any of the individal methods. In experimental mechanics, the most common form of hybrid approach is one utilizing both numerical and experimental data to quantify the ...

  12. A hybrid computational-experimental approach for automated ...

    Our FPASS method takes as input easily obtained experimental data, that is, stoichiometry, diffraction pattern and candidate space group(s). In particular, the space group is known for over 90% of ...

  13. Hybrid Experiment

    The concept of a real-time hybrid model experiment was first proposed in civil engineering [ 54 ]. Originally, this method was used to investigate the dynamic responses of large buildings under the influence of seismic forces. In 1992, Nakashima et al. [ 55] proposed a real-time hybrid model experimental method.

  14. PDF Hybrid Simulation for Multi-hazard Engineering A Research Agenda

    exploit creative testing methods. Hybrid simulation is an experimental method developed within the field of structural engineering. In hybrid simulation, the less understood portions of a structural system may be isolated in an experimental substructure, while the predictable portions of the system are included

  15. Hybrid Research: Combining Qualitative and Quantitative Methods and More

    When the term first emerged, most research experts defined it as a combination of qualitative and quantitative research methods. Now it is more evolved than that. Hybrid research can be a combination of two or more research methodologies—regardless of whether it's qualitative plus quantitative. Further, it can be conducted in a series ...

  16. How the Experimental Method Works in Psychology

    The experimental method involves manipulating one variable to determine if this causes changes in another variable. This method relies on controlled research methods and random assignment of study subjects to test a hypothesis. For example, researchers may want to learn how different visual patterns may impact our perception.

  17. PDF Hybrid Analytical-numerical Methodology for Computationally Efficient

    efficiency by introducing a hybrid methodology that combines the analytical and finite-element approaches. We then validate the methodology using available experimental data and show that a good agreement with the experiments is observed. 1 INTRODUCTION In recent years, an efficient semi-analytical methodology has been developed for modelling

  18. Yeast Two-Hybrid, a Powerful Tool for Systems Biology

    The two most frequently used methods are yeast two-hybrid (Y2H) screening, a well established genetic in vivo approach, ... Experimental Y2H data have been a crucial part in establishing large synthetic human interactomes [25,26] or to dissect mechanisms in human disease . Two screening approaches can be distinguished: the matrix (or array) and ...

  19. A coupled analytical-FE hybrid approach for elastostatics

    Beginning with augmentation of experimental boundary-data with numerical methods in early hybrid methods (HMs), HMs have evolved in various states of hybridization amalgamating combinations of theoretical, numerical and experimental analysis techniques. In this work, an HM coupling coarse-mesh FE boundary-data with a theoretical solution in the 2D elastostatic framework is proposed and ...

  20. A Hybrid Experimental-Numerical Method to Support the Design of ...

    The paper uses a hydraulic performance analysis method to support the design of stock production multistage pumps. The method relies on a hybrid numerical-experimental approach conceived as a trade-off between accuracy and cost. It is based on CFD analyses incorporating experimental data of leakage flows across the sealing elements to obtain accurate predictions without the need of inclusion ...

  21. Full article: Experimental and numerical validation of a hybrid method

    The experimental conditions are listed in Table 2. The rotor speed and blade pitch angle of the model-scale turbines were determined using the model test method proposed by Hao et al. ... To validate the hybrid method of CFD(ALM)-IDWM, two in-line wind turbines were simulated, and their results were compared with those obtained by full CFD(ALM

  22. Numerical and experimental validation of a hybrid finite element

    A hybrid method combining FE and SEA was recently presented for predicting the steady-state response of vibro-acoustic systems with uncertain properties. The subsystems with long wavelength behavior are modeled deterministically with FE, while the subsystems with short wavelength behavior are modeled statistically with SEA.

  23. Experimental insights into the stability of graphene oxide ...

    The performance of these methods in stabilizing the resultant hybrid solution, comprising polymer, GO, and HS brine, herein referred to as HS-GOeP hybrid, was evaluated. In each method, the ...

  24. (PDF) Experimental Methods of Validation for Numerical Simulation

    The article presents experimental results on the impact of tundish flow regulator influencing the liquid steel flow course. The research was conducted based on the hybrid modelling methods understood as a complementary use of Computational Fluid

  25. Maptcha: an efficient parallel workflow for hybrid genome scaffolding

    Background Genome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate ...

  26. Experimental study and energy saving potential analysis of a hybrid air

    Experimental study and energy saving potential analysis of a hybrid air treatment cooling system in tropical climates Author(s) Cui, X; Islam, MR; Chua, KJ Year. 2019 Is Peer Reviewed? Yes Journal. Energy ISSN: 0360-5442 Volume. 172 Page Numbers. 1016-1026 ...

  27. Simplifying FFT-based methods for mechanics with automatic differentiation

    Fast-Fourier Transform (FFT) methods have been widely used in solid mechanics to address complex homogenization problems. However, current FFT-based methods face challenges that limit their applicability to intricate material models or complex mechanical problems. These challenges include the manual implementation of constitutive laws and the use of computationally expensive and complex ...

  28. Hybrid model based on energy and experimental methods for ...

    Automatic grinding using robot manipulators requires simultaneous control of the robot endpoint and force interaction between the robot and the constraint surface. In robotic grinding, surface quality can be increased by accurate estimation of grinding forces where significant tool and workpiece deflection occurs. The small diameter of the tool causes different behavior in the grinding process ...

  29. JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid

    In recent years, talking head generation has become a focal point for researchers. Considerable effort is being made to refine lip-sync motion, capture expressive facial expressions, generate natural head poses, and achieve high video quality. However, no single model has yet achieved equivalence across all these metrics. This paper aims to animate a 3D face using Jamba, a hybrid Transformers ...

  30. Improving the Analysis of Phase-Separated Bio-Fuel Samples with Slice

    Separated samples are a particular challenge for NMR experiments. The boundary is severely detrimental to high-resolution spectra and normal NMR experiments simply add the two spectra of the two layers together. Pyrolysis bio-oils represent an increasingly important alternative fuel resource yet readily sepa ... Methods, 2024, Accepted ...