The first type of QED highlighted in this review is perhaps the most straightforward type of intervention design: the pre-post comparison study with a non-equivalent control group. In this design, the intervention is introduced at a single point in time to one or more sites, for which there is also a pre-test and post-test evaluation period, The pre-post differences between these two sites is then compared. In practice, interventions using this design are often delivered at a higher level, such as to entire communities or organizations 1 [ Figure 1 here]. In this design the investigators identify additional site(s) that are similar to the intervention site to serve as a comparison/control group. However, these control sites are different in some way than the intervention site(s) and thus the term “non-equivalent” is important, and clarifies that there are inherent differences in the treatment and control groups ( 15 ).
Illustration of the Pre-Post Non-Equivalent Control Group Design
The strengths of pre-post designs are mainly based in their simplicity, such as data collection is usually only at a few points (although sometimes more). However, pre-post designs can be affected by several of the threats to internal validity of QEDs presented here. The largest challenges are related to 1) ‘history bias’ in which events unrelated to the intervention occur (also referred to as secular trends) before or during the intervention period and have an effect on the outcome (either positive or negative) that are not related to the intervention ( 39 ); and 2) differences between the intervention and control sites because the non-equivalent control groups are likely to differ from the intervention sites in a number of meaningful ways that impact the outcome of interest and can bias results (selection bias).
At this design stage, the first step at improving internal validity would be focused on selection of a non-equivalent control group(s) for which some balance in the distribution of known risk factors is established. This can be challenging as there may not be adequate information available to determine how ‘equivalent’ the comparison group is regarding relevant covariates.
It can be useful to obtain pre-test data or baseline characteristics to improve the comparability of the two groups. In the most controlled situations within this design, the investigators might include elements of randomization or matching for individuals in the intervention or comparison site, to attempt to balance the covariate distribution. Implicit in this approach is the assumption that the greater the similarity between groups, the smaller the likelihood that confounding will threaten inferences of causality of effect for the intervention ( 33 , 47 ). Thus, it is important to select this group or multiple groups with as much specificity as possible.
In order to enhance the causal inference for pre-post designs with non-equivalent control groups, the best strategies improve the comparability of the control group with regards to potential covariates related to the outcome of interest but are not under investigation. One strategy involves creating a cohort, and then using targeted sampling to inform matching of individuals within the cohort. Matching can be based on demographic and other important factors (e.g. measures of health care access or time-period). This design in essence creates a matched, nested case-control design.
Collection of additional data once sites are selected cannot in itself reduce bias, but can inform the examination of the association of interest, and provide data supporting interpretation consistent with the reduced likelihood of bias. These data collection strategies include: 1) extra data collection points at additional pre- or post- time points (to get closer to an interrupted time series design in effect and examine potential threats of maturation and history bias), and 2) collection of data on other dependent variables with a priori assessment of how they will ‘react’ with time dependent variables. A detailed analysis can then provide information on the potential affects on the outcome of interest (to understand potential underlying threats due to history bias).
Additionally, there are analytic strategies that can improve the interpretation of this design, such as: 1) analysis for multiple non-equivalent control groups, to determine if the intervention effects are robust across different conditions or settings (.e.g. using sensitivity analysis), 2) examination within a smaller critical window of the study in which the intervention would be plausibly expected to make the most impact, and 3) identification of subgroups of individuals within the intervention community who are known to have received high vs. low exposure to the intervention, to be able to investigate a potential “dose-response” effect. Table 2 provides examples of studies using the pre-post non-equivalent control group designs that have employed one or more of these improvement approaches to improve the internal study’s validity.
Improving Quasi-Experimental Designs-Internal and External Validity Considerations
Study/General Design | Intervention | Design Strategy to Improve Internal Validity | Design Strategy to Improve External Validity |
Pre-Post Designs with Non-Equivalent Control Group | |||
Cousins et al 2016 | Campus Watch program targeting problem drinking and violence at 1 university campus with 5 control campuses in New Zealand | • Standardization of independent repeat sampling, survey and follow-up methods across all sites (5 control and 1 intervention site) • 5 sites as controls studies aggregate and individually as controls • Consumption and harms data from national surveys to compare data trends over time | Over-sampling of indigenous groups to extend interpretation of findings |
Chronic disease management program with pharmacist-based patient coaching within a health care insurance plan in Cincinnati, US | • Matching of participants with non-participants on demographic and health care access measures (using propensity score matching) | ||
Distribution of bed nets to prevent malaria and reduce malaria mortality in Gambia 41 sites receiving intervention compared to external villages (which differed by size and ethnic distribution) | • Examination of data trends during the highest infection times of the year (i.e., rainy season vs dry season) to see if rates were higher then. • Detailed study of those using bed nets within intervention villages (i.e., guaranteed exposure “dose”, to examine dose-response in intervention arm | ||
Interrupted Time Series | |||
Study/General Design | Intervention | Design Strategy to Improve Internal Validity | Design Strategy to Improve External Validity |
Pellegrin 2016 Interrupted time series with comparison group | Formal transfer of high-risk patients being discharged from hospital to a community-based pharmacist follow-up program for up to 1 year post-hospitalization (6 intervention and 5 control sites) | • Long baseline period (12 pre-intervention data points) • Intervention roll-out staggered based on staff availability (site 1 had eight post-intervention data points while site 8 had two) | Detailed implementation-related process measures monitored (and provided to individual community-based pharmacists regarding their performance) over entire study period |
Robinson 2015 Interrupted time series without control group | New hospital discharge program to support high-risk patients with nurse telephone follow-up and referral to specific services (such as pharmacists for medication reconciliation and review) | • Additionally examined regression discontinuity during the intervention period to determine if the risk score used to determine eligibility for the program influenced the outcome | Measured implementation outcomes of whether the intervention was delivered with high fidelity to the protocols |
Interrupted time series with comparison group | Removal of direct payment at point of health care services for children under 5, very low income individuals and pregnant women re: consultations, medications and hospitalizations | Built into a pilot to collect control data, and then extend this work to include additional districts, one intervention and one non-intervention district, along with 6 additional years of observation. | Examined sustainability over 72 months of follow-up, and associations with clinic characteristics, such as density of workforce. |
Stepped Wedge Design | |||
Study/General Design | Intervention | Design Strategy to Improve Internal Validity | Design Strategy to Improve External Validity |
Non-randomized stepped wedge cluster trial | Site-level roll out of integrated antiretroviral treatment (ART) intervention in 8 public sector clinics, to achieve more rapid treatment initiation among women with HIV in Zambia, than the existing referral method used for initiation of treatment. | • The 8 sites were matched into four pairs based on the number of HIV-infected pregnant women expected in each site. • The intervention roll out was done for one member of the least busy pair, one member of the second busiest pair, one member of the third busiest pair, and one member of the busiest pair. Rollout to the remaining pairs proceeded in reverse order. • A transition cohort was established that was later excluded from the analysis. It included women who were identified as eligible in the control period of time close to the time the intervention was starting. | |
See also: Randomized stepped wedge cluster trial | Multi-faceted quality improvement intervention with a passive and an active phase among 6 regional emergency medical services systems and 32 academic and community hospitals in Ontario, Canada. The intervention focused on comparing interventions to improve the implementation of targeted temperature management following out-of-hospital cardiac arrest through passive (education, generic protocol, order set, local champions) versus additional active quality improvement interventions (nurse specialist providing site-specific interven- tions, monthly audit-feedback, network educational events, inter- net blog) versus no intervention (baseline standard of care). | : • Randomization at the level of the hospital, rather than the patient to minimize contamination, since the intervention targeted groups of clinicians. • Hospitals were stratified by number of Intensive Care Unit beds ((< 10 beds vs ≥ 10 beds as a proxy for hospital size). Randomization was done within strata. • Formalized a transition cohort for which a more passive intervention strategy was tested. This also allowed more time for sites to adopt all elements of the complex intervention before crossing over to the active intervention group. | Characterization of system and organizational factors that might affect adoption: Collection of longitudinal data relevant to implementation processes that could impact interpretation of findings such as academic vs community affiliation, urban vs rural (bed size) |
Randomized stepped wedge cluster trial | Seasonal malaria prophylaxis for children up to age 10 in central Senegal given to households monthly through health system staff led home visits during the malaria season. The first two phases of implementation focused on children under age 5 years and the last phase included children up to age 10 years, and maintained a control only group of sites during this period. | : • Constrained randomization of program roll-out across 54 health posts catchment areas and center-covered regions, • More sites received the intervention later stages (n=18) than in beginning (n=9). • To achieve balance within settings for potential confounders (since they did not have data on malaria incidence), such as distance from river, distance from health center, population size and number of villages, assessment of ability to implement. • Included nine clinics as control sites throughout the study period. | Characterization of factors that might affect usage and adherence made with longitudinal data: Independent evaluations of malaria prophylaxis usage, adherence, and acceptance were included prospectively, using routine health cards at family level and with external assessments from community surveys. In-depth interviews conducted across community levels to understand acceptability and other responses to the intervention Included an embedded study broadening inclusion criteria, to focus on a wider age group of at risk children |
Wait-list randomized stepped wedge design | Enrollment of 1,655 male mine employees with HIV infection randomized over a short period of time into an intervention to prevent TB infection (use of isoniazid preventive therapy), among individuals with HIV. Treatment was self-administered for 6 months or for 12 months and results were based on cohort analyses. | • Employees were invited in random sequence to attend a workplace HIV clinic. | Enumeration of at risk cohort and estimation of spill-over effect beyond those enrolled: Since they used an enrollment list, they were able to estimate the effect of the intervention (the provision of clinic services) among the entire eligible population, not just those enrolled in the intervention over the study period. |
Ratanawongsa et al; Handley et al 2011 Wait-list randomized stepped wedge design | Enrollment of 362 patients with diabetes into a health-IT enabled self-management support telephone coaching program, using a wait-list generated from a regional health plan, delivered in 3 languages. | • Patients were identified from an actively maintained diabetes registry covering 4 safety net health clinics in the United States, and randomized to receive the coaching intervention immediately or after 6 moths. • Patients were randomized to balance enrolment for English, Cantonese, and Spanish, over the study period. | External validity-related measures for acceptability among patients as well as fidelity measures, for the health IT-enabled health coaching intervention were assessed using a fidelity framework. |
Bailet et al 2011 | Literacy intervention for pre-kindergarten children at risk for reading failure in a southern US city administered in child care and pre-school sites, delivered twice a week for 9 weeks. For large sites, did not randomize at site level, but split the schools, so all children could be taught in the intervention period, either fall or spring. At-risk children in these “split” schools received intervention at only one of the two time points (as did their “non-split school” peers); however, the randomization to treatment group occurred at the child level. | • Random assignment of clusters (schools). • Matched pairs of child care centers by zip code and percentage of children receiving a state-sponsored financial subsidy. Within these groups random assignment to receive either immediate or deferred enrolment into the intervention. | External validity was enhanced in years 2–3 with a focus on teacher training for ensuring measures fidelity, completion of each week of the curriculum to enhance assessment of a potential dose-response. Refined intervention applied in years 2–3, based on initial data. |
Mexican Government randomly chose 320 early intervention and 186 late (approximately one year later) intervention communities in seven states for Oportunidades, which provided cash transfers to families conditional on children attending school and family members obtaining preventive medical care and attending —education talks on health-related topics. | : • More communities randomized to an early intervention period |
Cousins et al utilized a non-equivalent control selection strategy to leverage a recent cross-sectional survey among six universities in New Zealand regarding drinking among college-age students ( 16 ). In the original survey, there were six sites, and for the control group, five were selected to provide non-equivalent control group data for the one intervention campus. The campus intervention targeted young adult drinking-related problems and other outcomes, such as aggressive behavior, using an environmental intervention with a community liaison and a campus security program (also know as a Campus Watch program). The original cross-sectional survey was administered nationally to students using a web-based format, and was repeated in the years soon after the Campus Watch intervention was implemented in one site. Benefits of the design include: a consistent sampling frame at each control sites, such that sites could be combined as well as evaluated separately and collection of additional data on alcohol sales and consumption over the study period, to support inference. In a study by Wertz et al ( 48 ), a non-equivalent control group was created using matching for those who were eligible for a health coaching program and opted out of the program (to be compared with those who opted in) among insured patients with diabetes and/or hypertension. Matching was based on propensity scores among those patients using demographic and socioeconomic factors and medical center location and a longitudinal cohort was created prior to the intervention (see Basu et al 2017 for more on this approach).
In the pre-post malaria-prevention intervention example from Gambia, the investigators were studying the introduction of bed nets treated with insecticide on malaria rates in Gambia, and collected additional data to evaluate the internal validity assumptions within their design ( 1 ). In this study, the investigators introduced bed nets at the village level, using communities not receiving the bed nets as control sites. To strengthen the internal validity they collected additional data that enabled them to: 1) determine whether the reduction in malaria rates were most pronounced during the rainy season within the intervention communities, as this was a biologically plausible exposure period in which they could expect the largest effect size difference between intervention and control sites, and 2) examine use patterns for the bed nets, based on how much insecticide was present in the bed nets over time (after regular washing occurred), which aided in calculating a “dose-response” effect of exposure to the bed net among a subsample of individuals in the intervention community.
An interrupted time series (ITS) design involves collection of outcome data at multiple time points before and after an intervention is introduced at a given point in time at one or more sites ( 6 , 13 ). The pre-intervention outcome data is used to establish an underlying trend that is assumed to continue unchanged in the absence of the intervention under study ( i.e., the counterfactual scenario). Any change in outcome level or trend from the counter-factual scenario in the post-intervention period is then attributed to the impact of the intervention. The most basic ITS design utilizes a regression model that includes only three time-based covariates to estimate the pre-intervention slope (outcome trend before the intervention), a “step” or change in level (difference between observed and predicted outcome level at the first post-intervention time point), and a change in slope (difference between post- and pre-intervention outcome trend) ( 13 , 32 ) [ Figure 2 here].
Interrupted Time Series Design
Whether used for evaluating a natural experiment or, as is the focus here, for prospective evaluation of an intervention, the appropriateness of an ITS design depends on the nature of the intervention and outcome, and the type of data available. An ITS design requires the pre- and post-intervention periods to be clearly differentiated. When used prospectively, the investigator therefore needs to have control over the timing of the intervention. ITS analyses typically involve outcomes that are expected to change soon after an intervention is introduced or after a well-defined lag period. For example, for outcomes such as cancer or incident tuberculosis that develop long after an intervention is introduced and at a variable rate, it is difficult to clearly separate the pre- and post-intervention periods. Last, an ITS analysis requires at least three time points in the pre- and post-intervention periods to assess trends. In general, a larger number of time points is recommended, particularly when the expected effect size is smaller, data are more similar at closer together time points ( i.e., auto-correlation), or confounding effects ( e.g., seasonality) are present. It is also important for investigators to consider any changes to data collection or recording over time, particularly if such changes are associated with introduction of the intervention.
In comparison to simple pre-post designs in which the average outcome level is compared between the pre- and post-intervention periods, the key advantage of ITS designs is that they evaluate for intervention effect while accounting for pre-intervention trends. Such trends are common due to factors such as changes in the quality of care, data collection and recording, and population characteristics over time. In addition, ITS designs can increase power by making full use of longitudinal data instead of collapsing all data to single pre- and post-intervention time points. The use of longitudinal data can also be helpful for assessing whether intervention effects are short-lived or sustained over time.
While the basic ITS design has important strengths, the key threat to internal validity is the possibility that factors other than the intervention are affecting the observed changes in outcome level or trend. Changes over time in factors such as the quality of care, data collection and recording, and population characteristics may not be fully accounted for by the pre-intervention trend. Similarly, the pre-intervention time period, particularly when short, may not capture seasonal changes in an outcome.
Detailed reviews have been published of variations on the basic ITS design that can be used to enhance causal inference. In particular, the addition of a control group can be particularly useful for assessing for the presence of seasonal trends and other potential time-varying confounders ( 52 ). Zombre et al ( 52 ) maintained a large number of control number of sites during the extended study period and were able to look at variations in seasonal trends as well as clinic-level characteristics, such as workforce density and sustainability. In addition to including a control group, several analysis phase strategies can be employed to strengthen causal inference including adjustment for time varying confounders and accounting for auto correlation.
Stepped wedge designs (SWDs) involve a sequential roll-out of an intervention to participants (individuals or clusters) over several distinct time periods ( 5 , 7 , 22 , 24 , 29 , 30 , 38 ). SWDs can include cohort designs (with the same individuals in each cluster in the pre and post intervention steps), and repeated cross-sectional designs (with different individuals in each cluster in the pre and post intervention steps) ( 7 ). In the SWD, there is a unidirectional, sequential roll- out of an intervention to clusters (or individuals) that occurs over different time periods. Initially all clusters (or individuals) are unexposed to the intervention, and then at regular intervals, selected clusters cross over (or ‘step’) into a time period where they receive the intervention [ Figure 3 here]. All clusters receive the intervention by the last time interval (although not all individuals within clusters necessarily receive the intervention). Data is collected on all clusters such that they each contribute data during both control and intervention time periods. The order in which clusters receive the intervention can be assigned randomly or using some other approach when randomization is not possible. For example, in settings with geographically remote or difficult-to-access populations, a non-random order can maximize efficiency with respect to logistical considerations.
Illustration of the stepped wedge study design-Intervention Roll-Out Over Time*
* Adapted from Turner et al 2017
The practical and social benefits of the stepped wedge design have been summarized in recent reviews ( 5 , 22 , 24 , 27 , 29 , 36 , 38 , 41 , 42 , 45 , 46 , 51 ). In addition to addressing general concerns with RCTs discussed earlier, advantages of SWDs include the logistical convenience of staggered roll-out of the intervention, which enables a.smaller staff to be distributed across different implementation start times and allows for multi-level interventions to be integrated into practice or ‘real world’ settings (referred to as the feasibility benefit). This benefit also applies to studies of de-implementation, prior to a new approach being introduced. For example, with a staggered roll-out it is possible to build in a transition cohort, such that sites can adjust to the integration of the new intervention, and also allow for a switching over in sites to de-implementing a prior practice. For a specified time period there may be ‘mixed’ or incomplete data, which can be excluded from the data analysis. However, associated with a longer duration of roll-out for practical reasons such as this switching, are associated costs in threats to internal validity, discussed below.
There are several limitations to the SWD. These generally involve consequences of the trade-offs related to having design control for the intervention roll-out, often due to logistical reasons on the one hand, but then having ‘down the road’ threats to internal validity. These roll-out related threats include potential lagged intervention effects for non-acute outcomes; possible fatigue and associated higher drop-out rates of waiting for the cross-over among clusters assigned to receive the intervention later; fidelity losses for key intervention components over time; and potential contamination of later clusters ( 22 ). Another drawback of the SWD is that it involves data assessment at each point when a new cluster receives the intervention, substantially increasing the burden of data collection and costs unless data collection can be automated or uses existing data sources. Because the SWD often has more clusters receiving the intervention towards the end of the intervention period than in previous time periods, there is a potential concern that there can be temporal confounding at this stage. The SWD is also not as suited for evaluating intervention effects on delayed health outcomes (such as chronic disease incidence), and is most appropriate when outcomes that occur relatively soon after each cluster starts receiving the intervention. Finally, as logistical necessity often dictates selecting a design with smaller numbers of clusters, there are relatedly challenges in the statistical analysis. To use standard software, the common recommendation is to have at least 20 to 30 clusters ( 35 ).
Stepped wedge designs can embed improvements that can enhance internal validity, mimicking the strength of RCTs. These generally focus on efforts to either reduce bias or achieve balance in covariates across sites and over time; and/or compensate as much as possible for practical decisions made at the implementation stage, which affect the distribution of the intervention over time and by sites. The most widely used approaches are discussed in order of benefit to internal validity: 1) partial randomization; 2) stratification and matching; 3) embedding data collection at critical points in time, such as with a phasing-in of intervention components, and 4) creating a transition cohort or wash-out period. The most important of these SWD elements is random assignment of clusters as to when they will cross over into the intervention period. As well, utilizing data regarding time-varying covariates/confounders, either to stratify clusters and then randomize within strata (partial randomization) or to match clusters on known covariates in the absence of randomization, are techniques often employed to minimize bias and reduce confounding. Finally, maintaining control over the number and timing of data collection points over the study period can be beneficial in several ways. First, it can allow for data analysis strategies that can incorporate cyclical temporal trends (such as seasonality-mediated risk for the outcome, such as with flu or malaria) or other underlying temporal trends. Second, it can enable phased interventions to be studied for the contribution of different components included in the phases (e.g. passive then active intervention components), or can enable ‘pausing’ time, as when there is a structured wash out or transition cohort created for practical reasons (e.g. one intervention or practice is stopped/de-implemented, and a new one is introduced) (see Figure 4 ).
Illustration of the stepped wedge study design- Summary of Exposed and Unexposed Cluster Time*
Adapted from Hemming 2015
Table 2 provides examples of studies using SWD that have used one or more of the design approaches described above to improve the internal validity of the study. In the study by Killam et al 2010 ( 31 ), a non-randomized SWD was used to evaluate a complex clinic-based intervention for integrating anti-retro viral (ART) treatment into routine antenatal care in Zambia for post-partum women. The design involved matching clinics by size and an inverse roll-out, to balance out the sizes across the four groups. The inverse roll-out involved four strata of clinics, grouped by size with two clinics in each strata. The roll-out was sequenced across these eight clinics, such that one smaller clinics began earlier, with three clinics of increasing size getting the intervention afterwards. This was then followed by a descending order of clinics by size for the remaining roll-out, ending with the smallest clinic. This inverse roll-out enabled the investigators to start with a smaller clinic, to work out the logistical considerations, but then influence the roll-out such as to avoid clustering of smaller or larger clinics in any one step of the intervention.
A second design feature of this study involved the use of a transition cohort or wash-out period (see Figure 4 ) (also used in the Morrison et al 2015 study)( 19 , 37 ). This approach can be used when an existing practice is being replaced with the new intervention, but there is ambiguity as to which group an individual would be assigned to while integration efforts were underway. In the Killam study, the concern was regarding women who might be identified as ART-eligible in the control period but actually enroll into and initiate ART at an antenatal clinic during the intervention period. To account for the ambiguity of this transition period, patients with an initial antenatal visit more than 60 days prior to the date of implementing the ART in the intervention sites were excluded. For analysis of the primary outcome, patients were categorized into three mutually exclusive categories: a referral to ART cohort, an integrated ART in the antenatal clinics cohort, and a transition cohort. It is important to note that the time period for a transition cohort can add considerable time to an intervention roll-out, especially when there is to be a de-implementation of an existing practice that involves a wide range or staff or activities. As well, the exclusion of the data during this phase can reduce the study’s power if not built into the sample size considerations at the design phase.
Morrison et al 2015 ( 37 ) used a randomized cluster design, with additional stratification and randomization within relevant sub-groups to examine a two-part quality improvement intervention focusing on clinician uptake of patient cooling procedures for post-cardiac care in hospital settings (referred to as Targeted Temperature Management). In this study, 32 hospitals were stratified into two groups based on intensive care unit size (< 10 beds vs ≥ 10 beds), and then randomly assigned into four different time periods to receive the intervention. The phased intervention implementation included both passive (generic didactic training components regarding the intervention) and an active (tailored support to site-specific barriers identified in passive phase) components. This study exemplifies some of the best uses of SWD in the context of QI interventions that have either multiple components of for which there may be a passive and active phase, as is often the case with interventions that are layered onto systems change requirements (e.g. electronic records improvements/customization) or relate to sequenced guidelines implementation (as in this example).
Studies using a wait-list partial randomization design are also included in Table 2 ( 24 , 27 , 42 ). These types of studies are well-suited to settings where there is routine enumeration of a cohort based on a specific eligibility criteria, such as enrolment in a health plan or employment group, or from a disease-based registry, such as for diabetes ( 27 , 42 ). It has also been reported that this design can increase efficiency and statistical power in contrast to cluster-based trials, a crucial consideration when the number of participating individuals or groups is small ( 22 ).
The study by Grant et al et al uses a variant of the SWD for which individuals within a setting are enumerated and then randomized to get the intervention. In this example, employees who had previously screened positive for HIV at the company clinic as part of mandatory testing, were invited in random sequence to attend a workplace HIV clinic at a large mining facility in South Africa to initiate a preventive treatment for TB during the years prior to the time when ARTs were more widely available. Individuals contributed follow-up time to the “pre-clinic” phase from the baseline date established for the cohort until the actual date of their first clinic visit, and also to the “post- clinic” phase thereafter. Clinic visits every 6 months were used to identify incident TB events. Because they were looking at reduction in TB incidence among the workers at the mine and not just those in the study, the effect of the intervention (the provision of clinic services) was estimated for the entire study population (incidence rate ratio), irrespective of whether they actually received isoniazid.
We present a decision ‘map’ approach based on a Figure 5 to assist in considering decisions in selecting among QEDs and for which features you can pay particular attention to in the design [ Figure 5 here].
Quasi-Experimental Design Decision-Making Map
First, at the top of the flow diagram ( 1 ), consider if you can have multiple time points you can collect data for in the pre and post intervention periods. Ideally, you will be able to select more than two time points. If you cannot, then multiple sites would allow for a non-equivalent pre-post design. If you can have more than the two time points for the study assessments, you next need to determine if you can include multiple sites ( 2 ). If not, then you can consider a single site point ITS. If you can have multiple sites, you can choose between a SWD and a multiple site ITS based on whether or not you observe the roll-out over multiple time points, (SWD) or if you have only one intervention time point (controlled multiple site ITS)
In a recent article in this journal ( 26 ), the following observation was made that there is an unavoidable trade-off between these two forms of validity such that with a higher control of a study, there is stronger evidence for internal validity but that control may jeopardize some of the external validity of that stronger evidence. Nonetheless, there are design strategies for non-experimental studies that can be undertaken to improve the internal validity while not eliminating considerations of external validity. These are described below across all three study designs.
One of the strengths of QEDs is that they are often employed to examine intervention effects in real world settings and often, for more diverse populations and settings. Consequently, if there is adequate examination of characteristics of participants and setting-related factors it can be possible to interpret findings among critical groups for which there may be no existing evidence of an intervention effect for. For example in the Campus Watch intervention ( 16 ), the investigator over-sampled the Maori indigenous population in order to be able to stratify the results and investigate whether the program was effective for this under-studied group. In the study by Zombré et al ( 52 ) on health care access in Burkina Faso, the authors examined clinic density characteristics to determine its impact on sustainability.
Some of the most important outcomes for examination in these QED studies include whether the intervention was delivered as intended (i.e., fidelity), maintained over the entire study period (i.e., sustainability), and if the outcomes could be specifically examined by this level of fidelity within or across sites. As well, when a complex intervention is related to a policy or guideline shift and implementation requires logistical adjustments (such as phased roll-outs to embed the intervention or to train staff), QEDs more truly mimic real world constraints. As a result, capturing processes of implementation are critical as they can describe important variation in uptake, informing interpretation of the findings for external validity. As described by Prost et al ( 41 ), for example, it is essential to capture what occurs during such phased intervention roll-outs, as with following established guidelines for the development of complex interventions including efforts to define and protocolize activities before their implementation ( 17 , 18 , 28 ). However, QEDs are often conducted by teams with strong interests in adapting the intervention or ‘learning by doing’, which can limit interpretation of findings if not planned into the design. As done in the study by Bailet et al ( 3 ), the investigators refined intervention, based on year 1 data, and then applied in years 2–3, at this later time collecting additional data on training and measurement fidelity. This phasing aspect of implementation generates a tension between protocolizing interventions and adapting them as they go along. When this is the case, additional designs for the intervention roll-out, such as adaptive or hybrid designs can also be considered.
External validity can be improved when the intervention is applied to entire communities, as with some of the community-randomized studies described in Table 2 ( 12 , 21 ). In these cases, the results are closer to the conditions that would apply if the interventions were conducted ‘at scale’, with a large proportion of a population receiving the intervention. In some cases QEDs also afford greater access for some intervention research to be conducted in remote or difficult to reach communities, where the cost and logistical requirements of an RCT may become prohibitive or may require alteration of the intervention or staffing support to levels that would never be feasible in real world application.
Frameworks can be helpful to enhances interpretability of many kinds of studies, including QEDs and can help ensure that information on essential implementation strategies are included in the results ( 44 ). Although several of the case studies summarized in this article included measures that can improve external validity (such as sub-group analysis of which participants were most impacted, process and contextual measures that can affect variation in uptake), none formally employ an implementation framework. Green and Glasgow (2006) ( 25 ) have outlined several useful criteria for gaging the extent to which an evaluation study also provides measures that enhance interpretation of external validity, for which those employing QEDs could identify relevant components and frameworks to include in reported findings.
It has been observed that it is more difficult to conduct a good quasi-experiment than to conduct a good randomized trial ( 43 ). Although QEDs are increasingly used, it is important to note that randomized designs are still preferred over quasi-experiments except where randomization is not possible. In this paper we present three important QEDs and variants nested within them that can increase internal validity while also improving external validity considerations, and present case studies employing these techniques.
1 It is important to note that if such randomization would be possible at the site level based on similar sites, a cluster randomized control trial would be an option.
EDUR 7130 Educational Research On-Line
Quantitative Research Types
Quantitative Research Methods
Quantitative research is generally defined as four types: true experimental, quasi-experimental, ex post facto, and correlational. A brief overview of the differences and similarities of each type is presented below. A more detailed description of various components of experimental research is presented in Experimental Research: Control, Designs, Internal and External Validity
True and Quasi-Experimental Research
True Experimental
True experimental research can be identified by three characteristics: randomly formed groups, manipulation of the treatment (the IV), and comparisons among groups. These will be discussed in the context of the following example. We wish to know whether cooperative learning produces better achievement among 10th grade students in mathematics than a traditional lecture approach. A group of students, n = 50, will be randomly assigned to a classroom using cooperative learning or to a classroom using lecture, with 25 randomly assigned to each classroom. At the end of a semester, a final achievement test on mathematics will be administered to determine which groups scores, on average, higher in mathematics.
In true experimental research, the groups studied will be randomly formed. Recall from the section on sampling that random means a systematic approach is used assign people to groups, but the systematic approach used to assign has no predictable pattern. A table of random numbers gives this result; a flip of a coin also accomplishes this. For example, if we are assigning people to one of two groups, flipping a coin and deciding group membership for each person based on whether a head or a tail shows is random since one cannot predict accurately whether the head or tail will show.
It is easy to confuse randomly formed groups, or random assignment, with random sampling. The two are certainly not the same thing. Random sampling is one method for selecting--picking--people to participate in a study. Random assignment is a method for assigning people to groups--it is not a method for selecting study participants. Also note that random sampling is not required for a true experiment. Randomly formed groups are necessary for a true experiment, but one could use convenience sampling to select study participants and still have a true experiment. For the example study, students may have been selected based on who was available--based on convenience, then they were randomly assigned to one of two groups.
The second requirement, that the treatment be manipulated, means that the researcher has control of who receives which treatment. Manipulation in this sense is similar to the definition of politics--who gets what. If the researcher decides who gets what, then manipulation occurred. In the example, the researcher randomly assigned students to one of two groups, so the researcher manipulated who would receive which treatment, cooperative learning or lecture.
The third requirement, well, more of a characteristic than a requirement, is that groups are compared. In most experiments, there will be at least two groups, perhaps more, which will be compared on some outcome of interest, some dependent variable. In the example, the two groups are cooperative learning and lecture, and they will be compared on performance on the final achievement test.
Quasi-Experimental
Quasi-experimental research is just like true experimental with the only difference being the lack of randomly formed groups. Of the two types of experimental research, quasi-experimental is most commonly used in education. It is difficult to find schools that will allow a researcher to select students from classes and assign them randomly to other classes. So, in most educational research situations, intact classes are used for the experiment. When intact classes or groups are used, but manipulation is present--the researcher determines which group receives which treatment--then quasi-experimentation results. For example, a researcher uses his to classes for an experiment. He randomly assigns cooperative learning to class B, and randomly assigns lecture to class A. Following the treatment, an instrument is administered to all participants to learn whether the treatments resulted in differences between the two classes. Note in this example the groups were not randomly formed, but the treatment was manipulated and groups were compared, so quasi-experimentation resulted.
Non-experimental Quantitative Research: Ex Post Facto and Correlational
Both true and quasi-experimental research are distinguished by one common characteristic: manipulation. No other type of research has manipulation of the independent variable. Two other forms of quantitative research, which are not experimental due to lack of manipulation, are ex post facto (sometimes called causal-comparative) and correlational. Often both of these types are grouped into what researchers call non-experimental research or simply correlational research. Thus, correlational research can be understood to include both of the two types I discuss below: ex post facto and correlational. For our purposes, we will make a distinction between these two types.
Ex Post Facto (Causal-Comparative)
Ex post facto looks like an experiment because groups are compared; there is, however a key difference--no manipulation of the independent variable. With ex post facto research, the difference between groups on the independent variable occurs independent of the researcher. For example, suppose a researcher contacts a school's principal and asks for two teachers, one who uses cooperative learning and one who uses lecture. The researcher's goal is the compare student's scores on a test to determine which method produces better achievement. This is very similar to the example given above for experimental research, but the key difference is that the researcher did not manipulate the independent variable. The researcher did not determine which class, or which teacher, would use cooperative learning or lecture. Rather, the researcher asked which teachers use which instructional strategy, and then selected the groups for comparisons.
Another example of ex post facto is the analysis of differences in any quantitative outcome and by sex (male vs. female). For example, if one is interested in learning whether differences exist between males and females in ITBS scores, that is an ex post facto study since the independent variable cannot be manipulation, and since there are group comparisons.
So the keys to an ex post facto study are group comparisons and non-manipulated independent variables. Groups may be randomly formed in ex post facto research, such as through random sampling of males and females, but randomly formed groups alone is not enough for an ex post facto study to be confused with a true experimental study.
Correlational
A correlational study is the examination of relationships among two or more quantitative variables. Both the independent and dependent variables will be quantitative. It is possible to have multiple independent variables and possibly multiple dependent variables. For example, I wish to know which of the following variables independent variables (High School GPA, SAT scores, HS Rank) predict the following dependent variables (GRE mathematics, GRE verbal, college GPA).
Sometimes there is a distinction made between types of correlational studies. A predictive study is done simply to learn which, among a set, of variables best predicts the dependent variable. The goal here is simply to maximize prediction. A second type of study is relationship. With relationship studies, the goal is the understand, as best as possible, those variables that theoretically related a dependent variable. With this type of study, researchers are interested in testing and confirming theories or hypotheses concerning relationships among variables.
Matrix of Distinguishing Characteristics Among Quantitative Research Methods
The key differences among the four types of quantitative studies are outlined below in the matrix. Understanding this matrix will assist you in determining which methods are used in most quantitative research.
|
| |||
Can Establish | Only Identify | Only Identify | Only Identify | |
Yes | No | maybe, through random sampling | maybe through random sampling | |
Yes | Yes | No | No | |
Usually Yes | Usually Yes | Yes | No |
1. One can only establish the existence causal relationships through repeated experimentation, i.e., replication of an experiment. A single experiment cannot be used to establish either the presence or absence of a relationship between two or more variables.
2. Note the emphasis on randomly formed groups, not randomly selected groups. To have a true experiment, one does not need to have randomly selected groups, but one must have randomly formed groups.
3. Manipulation is the single characteristic that differentiates experimental from non-experimental research.
4. The characteristic of group comparisons represents a trivial and archaic distinction between ex post facto and correlational research. In practice, this characteristic is only reflected in the scale of the independent variable used. For ex post facto studies, the independent variable will be nominal, while for correlational studies the independent variable will be ordinal, interval, or ratio.
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Experimental research—often considered to be the ‘gold standard’ in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed. The unique strength of experimental research is its internal validity (causality) due to its ability to link cause and effect through treatment manipulation, while controlling for the spurious effect of extraneous variable.
Experimental research is best suited for explanatory research—rather than for descriptive or exploratory research—where the goal of the study is to examine cause-effect relationships. It also works well for research that involves a relatively limited and well-defined set of independent variables that can either be manipulated or controlled. Experimental research can be conducted in laboratory or field settings. Laboratory experiments , conducted in laboratory (artificial) settings, tend to be high in internal validity, but this comes at the cost of low external validity (generalisability), because the artificial (laboratory) setting in which the study is conducted may not reflect the real world. Field experiments are conducted in field settings such as in a real organisation, and are high in both internal and external validity. But such experiments are relatively rare, because of the difficulties associated with manipulating treatments and controlling for extraneous effects in a field setting.
Experimental research can be grouped into two broad categories: true experimental designs and quasi-experimental designs. Both designs require treatment manipulation, but while true experiments also require random assignment, quasi-experiments do not. Sometimes, we also refer to non-experimental research, which is not really a research design, but an all-inclusive term that includes all types of research that do not employ treatment manipulation or random assignment, such as survey research, observational research, and correlational studies.
Treatment and control groups. In experimental research, some subjects are administered one or more experimental stimulus called a treatment (the treatment group ) while other subjects are not given such a stimulus (the control group ). The treatment may be considered successful if subjects in the treatment group rate more favourably on outcome variables than control group subjects. Multiple levels of experimental stimulus may be administered, in which case, there may be more than one treatment group. For example, in order to test the effects of a new drug intended to treat a certain medical condition like dementia, if a sample of dementia patients is randomly divided into three groups, with the first group receiving a high dosage of the drug, the second group receiving a low dosage, and the third group receiving a placebo such as a sugar pill (control group), then the first two groups are experimental groups and the third group is a control group. After administering the drug for a period of time, if the condition of the experimental group subjects improved significantly more than the control group subjects, we can say that the drug is effective. We can also compare the conditions of the high and low dosage experimental groups to determine if the high dose is more effective than the low dose.
Treatment manipulation. Treatments are the unique feature of experimental research that sets this design apart from all other research methods. Treatment manipulation helps control for the ‘cause’ in cause-effect relationships. Naturally, the validity of experimental research depends on how well the treatment was manipulated. Treatment manipulation must be checked using pretests and pilot tests prior to the experimental study. Any measurements conducted before the treatment is administered are called pretest measures , while those conducted after the treatment are posttest measures .
Random selection and assignment. Random selection is the process of randomly drawing a sample from a population or a sampling frame. This approach is typically employed in survey research, and ensures that each unit in the population has a positive chance of being selected into the sample. Random assignment, however, is a process of randomly assigning subjects to experimental or control groups. This is a standard practice in true experimental research to ensure that treatment groups are similar (equivalent) to each other and to the control group prior to treatment administration. Random selection is related to sampling, and is therefore more closely related to the external validity (generalisability) of findings. However, random assignment is related to design, and is therefore most related to internal validity. It is possible to have both random selection and random assignment in well-designed experimental research, but quasi-experimental research involves neither random selection nor random assignment.
Threats to internal validity. Although experimental designs are considered more rigorous than other research methods in terms of the internal validity of their inferences (by virtue of their ability to control causes through treatment manipulation), they are not immune to internal validity threats. Some of these threats to internal validity are described below, within the context of a study of the impact of a special remedial math tutoring program for improving the math abilities of high school students.
History threat is the possibility that the observed effects (dependent variables) are caused by extraneous or historical events rather than by the experimental treatment. For instance, students’ post-remedial math score improvement may have been caused by their preparation for a math exam at their school, rather than the remedial math program.
Maturation threat refers to the possibility that observed effects are caused by natural maturation of subjects (e.g., a general improvement in their intellectual ability to understand complex concepts) rather than the experimental treatment.
Testing threat is a threat in pre-post designs where subjects’ posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam.
Not conducting a pretest can help avoid this threat.
Instrumentation threat , which also occurs in pre-post designs, refers to the possibility that the difference between pretest and posttest scores is not due to the remedial math program, but due to changes in the administered test, such as the posttest having a higher or lower degree of difficulty than the pretest.
Mortality threat refers to the possibility that subjects may be dropping out of the study at differential rates between the treatment and control groups due to a systematic reason, such that the dropouts were mostly students who scored low on the pretest. If the low-performing students drop out, the results of the posttest will be artificially inflated by the preponderance of high-performing students.
Regression threat —also called a regression to the mean—refers to the statistical tendency of a group’s overall performance to regress toward the mean during a posttest rather than in the anticipated direction. For instance, if subjects scored high on a pretest, they will have a tendency to score lower on the posttest (closer to the mean) because their high scores (away from the mean) during the pretest were possibly a statistical aberration. This problem tends to be more prevalent in non-random samples and when the two measures are imperfectly correlated.
Pretest-posttest control group design . In this design, subjects are randomly assigned to treatment and control groups, subjected to an initial (pretest) measurement of the dependent variables of interest, the treatment group is administered a treatment (representing the independent variable of interest), and the dependent variables measured again (posttest). The notation of this design is shown in Figure 10.1.
Statistical analysis of this design involves a simple analysis of variance (ANOVA) between the treatment and control groups. The pretest-posttest design handles several threats to internal validity, such as maturation, testing, and regression, since these threats can be expected to influence both treatment and control groups in a similar (random) manner. The selection threat is controlled via random assignment. However, additional threats to internal validity may exist. For instance, mortality can be a problem if there are differential dropout rates between the two groups, and the pretest measurement may bias the posttest measurement—especially if the pretest introduces unusual topics or content.
Posttest -only control group design . This design is a simpler version of the pretest-posttest design where pretest measurements are omitted. The design notation is shown in Figure 10.2.
The treatment effect is measured simply as the difference in the posttest scores between the two groups:
The appropriate statistical analysis of this design is also a two-group analysis of variance (ANOVA). The simplicity of this design makes it more attractive than the pretest-posttest design in terms of internal validity. This design controls for maturation, testing, regression, selection, and pretest-posttest interaction, though the mortality threat may continue to exist.
Because the pretest measure is not a measurement of the dependent variable, but rather a covariate, the treatment effect is measured as the difference in the posttest scores between the treatment and control groups as:
Due to the presence of covariates, the right statistical analysis of this design is a two-group analysis of covariance (ANCOVA). This design has all the advantages of posttest-only design, but with internal validity due to the controlling of covariates. Covariance designs can also be extended to pretest-posttest control group design.
Two-group designs are inadequate if your research requires manipulation of two or more independent variables (treatments). In such cases, you would need four or higher-group designs. Such designs, quite popular in experimental research, are commonly called factorial designs. Each independent variable in this design is called a factor , and each subdivision of a factor is called a level . Factorial designs enable the researcher to examine not only the individual effect of each treatment on the dependent variables (called main effects), but also their joint effect (called interaction effects).
In a factorial design, a main effect is said to exist if the dependent variable shows a significant difference between multiple levels of one factor, at all levels of other factors. No change in the dependent variable across factor levels is the null case (baseline), from which main effects are evaluated. In the above example, you may see a main effect of instructional type, instructional time, or both on learning outcomes. An interaction effect exists when the effect of differences in one factor depends upon the level of a second factor. In our example, if the effect of instructional type on learning outcomes is greater for three hours/week of instructional time than for one and a half hours/week, then we can say that there is an interaction effect between instructional type and instructional time on learning outcomes. Note that the presence of interaction effects dominate and make main effects irrelevant, and it is not meaningful to interpret main effects if interaction effects are significant.
Hybrid designs are those that are formed by combining features of more established designs. Three such hybrid designs are randomised bocks design, Solomon four-group design, and switched replications design.
Randomised block design. This is a variation of the posttest-only or pretest-posttest control group design where the subject population can be grouped into relatively homogeneous subgroups (called blocks ) within which the experiment is replicated. For instance, if you want to replicate the same posttest-only design among university students and full-time working professionals (two homogeneous blocks), subjects in both blocks are randomly split between the treatment group (receiving the same treatment) and the control group (see Figure 10.5). The purpose of this design is to reduce the ‘noise’ or variance in data that may be attributable to differences between the blocks so that the actual effect of interest can be detected more accurately.
Solomon four-group design . In this design, the sample is divided into two treatment groups and two control groups. One treatment group and one control group receive the pretest, and the other two groups do not. This design represents a combination of posttest-only and pretest-posttest control group design, and is intended to test for the potential biasing effect of pretest measurement on posttest measures that tends to occur in pretest-posttest designs, but not in posttest-only designs. The design notation is shown in Figure 10.6.
Switched replication design . This is a two-group design implemented in two phases with three waves of measurement. The treatment group in the first phase serves as the control group in the second phase, and the control group in the first phase becomes the treatment group in the second phase, as illustrated in Figure 10.7. In other words, the original design is repeated or replicated temporally with treatment/control roles switched between the two groups. By the end of the study, all participants will have received the treatment either during the first or the second phase. This design is most feasible in organisational contexts where organisational programs (e.g., employee training) are implemented in a phased manner or are repeated at regular intervals.
Quasi-experimental designs are almost identical to true experimental designs, but lacking one key ingredient: random assignment. For instance, one entire class section or one organisation is used as the treatment group, while another section of the same class or a different organisation in the same industry is used as the control group. This lack of random assignment potentially results in groups that are non-equivalent, such as one group possessing greater mastery of certain content than the other group, say by virtue of having a better teacher in a previous semester, which introduces the possibility of selection bias . Quasi-experimental designs are therefore inferior to true experimental designs in interval validity due to the presence of a variety of selection related threats such as selection-maturation threat (the treatment and control groups maturing at different rates), selection-history threat (the treatment and control groups being differentially impacted by extraneous or historical events), selection-regression threat (the treatment and control groups regressing toward the mean between pretest and posttest at different rates), selection-instrumentation threat (the treatment and control groups responding differently to the measurement), selection-testing (the treatment and control groups responding differently to the pretest), and selection-mortality (the treatment and control groups demonstrating differential dropout rates). Given these selection threats, it is generally preferable to avoid quasi-experimental designs to the greatest extent possible.
In addition, there are quite a few unique non-equivalent designs without corresponding true experimental design cousins. Some of the more useful of these designs are discussed next.
Regression discontinuity (RD) design . This is a non-equivalent pretest-posttest design where subjects are assigned to the treatment or control group based on a cut-off score on a preprogram measure. For instance, patients who are severely ill may be assigned to a treatment group to test the efficacy of a new drug or treatment protocol and those who are mildly ill are assigned to the control group. In another example, students who are lagging behind on standardised test scores may be selected for a remedial curriculum program intended to improve their performance, while those who score high on such tests are not selected from the remedial program.
Because of the use of a cut-off score, it is possible that the observed results may be a function of the cut-off score rather than the treatment, which introduces a new threat to internal validity. However, using the cut-off score also ensures that limited or costly resources are distributed to people who need them the most, rather than randomly across a population, while simultaneously allowing a quasi-experimental treatment. The control group scores in the RD design do not serve as a benchmark for comparing treatment group scores, given the systematic non-equivalence between the two groups. Rather, if there is no discontinuity between pretest and posttest scores in the control group, but such a discontinuity persists in the treatment group, then this discontinuity is viewed as evidence of the treatment effect.
Proxy pretest design . This design, shown in Figure 10.11, looks very similar to the standard NEGD (pretest-posttest) design, with one critical difference: the pretest score is collected after the treatment is administered. A typical application of this design is when a researcher is brought in to test the efficacy of a program (e.g., an educational program) after the program has already started and pretest data is not available. Under such circumstances, the best option for the researcher is often to use a different prerecorded measure, such as students’ grade point average before the start of the program, as a proxy for pretest data. A variation of the proxy pretest design is to use subjects’ posttest recollection of pretest data, which may be subject to recall bias, but nevertheless may provide a measure of perceived gain or change in the dependent variable.
Separate pretest-posttest samples design . This design is useful if it is not possible to collect pretest and posttest data from the same subjects for some reason. As shown in Figure 10.12, there are four groups in this design, but two groups come from a single non-equivalent group, while the other two groups come from a different non-equivalent group. For instance, say you want to test customer satisfaction with a new online service that is implemented in one city but not in another. In this case, customers in the first city serve as the treatment group and those in the second city constitute the control group. If it is not possible to obtain pretest and posttest measures from the same customers, you can measure customer satisfaction at one point in time, implement the new service program, and measure customer satisfaction (with a different set of customers) after the program is implemented. Customer satisfaction is also measured in the control group at the same times as in the treatment group, but without the new program implementation. The design is not particularly strong, because you cannot examine the changes in any specific customer’s satisfaction score before and after the implementation, but you can only examine average customer satisfaction scores. Despite the lower internal validity, this design may still be a useful way of collecting quasi-experimental data when pretest and posttest data is not available from the same subjects.
An interesting variation of the NEDV design is a pattern-matching NEDV design , which employs multiple outcome variables and a theory that explains how much each variable will be affected by the treatment. The researcher can then examine if the theoretical prediction is matched in actual observations. This pattern-matching technique—based on the degree of correspondence between theoretical and observed patterns—is a powerful way of alleviating internal validity concerns in the original NEDV design.
Experimental research is one of the most difficult of research designs, and should not be taken lightly. This type of research is often best with a multitude of methodological problems. First, though experimental research requires theories for framing hypotheses for testing, much of current experimental research is atheoretical. Without theories, the hypotheses being tested tend to be ad hoc, possibly illogical, and meaningless. Second, many of the measurement instruments used in experimental research are not tested for reliability and validity, and are incomparable across studies. Consequently, results generated using such instruments are also incomparable. Third, often experimental research uses inappropriate research designs, such as irrelevant dependent variables, no interaction effects, no experimental controls, and non-equivalent stimulus across treatment groups. Findings from such studies tend to lack internal validity and are highly suspect. Fourth, the treatments (tasks) used in experimental research may be diverse, incomparable, and inconsistent across studies, and sometimes inappropriate for the subject population. For instance, undergraduate student subjects are often asked to pretend that they are marketing managers and asked to perform a complex budget allocation task in which they have no experience or expertise. The use of such inappropriate tasks, introduces new threats to internal validity (i.e., subject’s performance may be an artefact of the content or difficulty of the task setting), generates findings that are non-interpretable and meaningless, and makes integration of findings across studies impossible.
The design of proper experimental treatments is a very important task in experimental design, because the treatment is the raison d’etre of the experimental method, and must never be rushed or neglected. To design an adequate and appropriate task, researchers should use prevalidated tasks if available, conduct treatment manipulation checks to check for the adequacy of such tasks (by debriefing subjects after performing the assigned task), conduct pilot tests (repeatedly, if necessary), and if in doubt, use tasks that are simple and familiar for the respondent sample rather than tasks that are complex or unfamiliar.
In summary, this chapter introduced key concepts in the experimental design research method and introduced a variety of true experimental and quasi-experimental designs. Although these designs vary widely in internal validity, designs with less internal validity should not be overlooked and may sometimes be useful under specific circumstances and empirical contingencies.
Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
869 Accesses
1 Citations
Experimental studies: Experiments, Randomized controlled trials (RCTs) ; Observational studies: Non-experimental studies, Non-manipulation studies, Naturalistic studies
The experimental study is a powerful methodology for testing causal relations between one or more explanatory variables (i.e., independent variables) and one or more outcome variables (i.e., dependent variable). In order to accomplish this goal, experiments have to meet three basic criteria: (a) experimental manipulation (variation) of the independent variable(s), (b) randomization – the participants are randomly assigned to one of the experimental conditions, and (c) experimental control for the effect of third variables by eliminating them or keeping them constant.
In observational studies, investigators observe or assess individuals without manipulation or intervention. Observational studies are used for assessing the mean levels, the natural variation, and the structure of variables, as well as...
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Atalay K, Barrett GF (2015) The impact of age pension eligibility age on retirement and program dependence: evidence from an Australian experiment. Rev Econ Stat 97:71–87. https://doi.org/10.1162/REST_a_00443
Article Google Scholar
Bergeman L, Boker SM (eds) (2016) Methodological issues in aging research. Psychology Press, Hove
Google Scholar
Byrkes CR, Bielak AMA (under review) Evaluation of publication bias and statistical power in gerontological psychology. Manuscript submitted for publication
Campbell DT, Stanley JC (1966) Experimental and quasi-experimental designs for research. Rand-McNally, Chicago
Carpenter D (2010) Reputation and power: organizational image and pharmaceutical regulation at the FDA. Princeton University Press, Princeton
Cavanaugh JC, Blanchard-Fields F (2019) Adult development and aging, 8th edn. Cengage, Boston
Fölster M, Hess U, Hühnel I et al (2015) Age-related response bias in the decoding of sad facial expressions. Behav Sci 5:443–460. https://doi.org/10.3390/bs5040443
Freund AM, Isaacowitz DM (2013) Beyond age comparisons: a plea for the use of a modified Brunswikian approach to experimental designs in the study of adult development and aging. Hum Dev 56:351–371. https://doi.org/10.1159/000357177
Haslam C, Morton TA, Haslam A et al (2012) “When the age is in, the wit is out”: age-related self-categorization and deficit expectations reduce performance on clinical tests used in dementia assessment. Psychol Aging 27:778–784. https://doi.org/10.1037/a0027754
Institute for Social Research (2018) The health and retirement study. Aging in the 21st century: Challenges and opportunities for americans. Survey Research Center, University of Michigan
Jung J (1971) The experimenter’s dilemma. Harper & Row, New York
Leary MR (2001) Introduction to behavioral research methods, 3rd edn. Allyn & Bacon, Boston
Lindenberger U, Scherer H, Baltes PB (2001) The strong connection between sensory and cognitive performance in old age: not due to sensory acuity reductions operating during cognitive assessment. Psychol Aging 16:196–205. https://doi.org/10.1037//0882-7974.16.2.196
Löckenhoff CE, Carstensen LL (2004) Socioemotional selectivity theory, aging, and health: the increasingly delicate balance between regulating emotions and making tough choices. J Pers 72:1395–1424. https://doi.org/10.1111/j.1467-6494.2004.00301.x
Maxwell SE (2015) Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? Am Psychol 70:487–498. https://doi.org/10.1037/a0039400
Menard S (2002) Longitudinal research (2nd ed.). Sage, Thousand Oaks, CA
Mitchell SJ, Scheibye-Knudsen M, Longo DL et al (2015) Animal models of aging research: implications for human aging and age-related diseases. Ann Rev Anim Biosci 3:283–303. https://doi.org/10.1146/annurev-animal-022114-110829
Moher D (1998) CONSORT: an evolving tool to help improve the quality of reports of randomized controlled trials. JAMA 279:1489–1491. https://doi.org/10.1001/jama.279.18.1489
Oxford Centre for Evidence-Based Medicine (2011) OCEBM levels of evidence working group. The Oxford Levels of Evidence 2. Available at: https://www.cebm.net/category/ebm-resources/loe/ . Retrieved 2018-12-12
Patten ML, Newhart M (2018) Understanding research methods: an overview of the essentials, 10th edn. Routledge, New York
Piccinin AM, Muniz G, Sparks C et al (2011) An evaluation of analytical approaches for understanding change in cognition in the context of aging and health. J Geront 66B(S1):i36–i49. https://doi.org/10.1093/geronb/gbr038
Pinquart M, Silbereisen RK (2006) Socioemotional selectivity in cancer patients. Psychol Aging 21:419–423. https://doi.org/10.1037/0882-7974.21.2.419
Redman LM, Ravussin E (2011) Caloric restriction in humans: impact on physiological, psychological, and behavioral outcomes. Antioxid Redox Signal 14:275–287. https://doi.org/10.1089/ars.2010.3253
Rutter M (2007) Proceeding from observed correlation to causal inference: the use of natural experiments. Perspect Psychol Sci 2:377–395. https://doi.org/10.1111/j.1745-6916.2007.00050.x
Schaie W, Caskle CI (2005) Methodological issues in aging research. In: Teti D (ed) Handbook of research methods in developmental science. Blackwell, Malden, pp 21–39
Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston
Sonnega A, Faul JD, Ofstedal MB et al (2014) Cohort profile: the health and retirement study (HRS). Int J Epidemiol 43:576–585. https://doi.org/10.1093/ije/dyu067
Weil J (2017) Research design in aging and social gerontology: quantitative, qualitative, and mixed methods. Routledge, New York
Download references
Authors and affiliations.
Psychology, Philipps University, Marburg, Germany
Martin Pinquart
You can also search for this author in PubMed Google Scholar
Correspondence to Martin Pinquart .
Editors and affiliations.
Population Division, Department of Economics and Social Affairs, United Nations, New York, NY, USA
Department of Population Health Sciences, Department of Sociology, Duke University, Durham, NC, USA
Matthew E. Dupre
Department of Sociology and Center for Population Health and Aging, Duke University, Durham, NC, USA
Kenneth C. Land
Department of Sociology, University of Kentucky, Lexington, KY, USA
Anthony R. Bardo
Reprints and permissions
© 2021 Springer Nature Switzerland AG
Cite this entry.
Pinquart, M. (2021). Experimental Studies and Observational Studies. In: Gu, D., Dupre, M.E. (eds) Encyclopedia of Gerontology and Population Aging. Springer, Cham. https://doi.org/10.1007/978-3-030-22009-9_573
DOI : https://doi.org/10.1007/978-3-030-22009-9_573
Published : 24 May 2022
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-22008-2
Online ISBN : 978-3-030-22009-9
eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Policies and ethics
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Methodology
Published on December 3, 2019 by Rebecca Bevans . Revised on June 21, 2023.
Experiments are used to study causal relationships . You manipulate one or more independent variables and measure their effect on one or more dependent variables.
Experimental design create a set of procedures to systematically test a hypothesis . A good experimental design requires a strong understanding of the system you are studying.
There are five key steps in designing an experiment:
For valid conclusions, you also need to select a representative sample and control any extraneous variables that might influence your results. If random assignment of participants to control and treatment groups is impossible, unethical, or highly difficult, consider an observational study instead. This minimizes several types of research bias, particularly sampling bias , survivorship bias , and attrition bias as time passes.
Step 1: define your variables, step 2: write your hypothesis, step 3: design your experimental treatments, step 4: assign your subjects to treatment groups, step 5: measure your dependent variable, other interesting articles, frequently asked questions about experiments.
You should begin with a specific research question . We will work with two research question examples, one from health sciences and one from ecology:
To translate your research question into an experimental hypothesis, you need to define the main variables and make predictions about how they are related.
Start by simply listing the independent and dependent variables .
Research question | Independent variable | Dependent variable |
---|---|---|
Phone use and sleep | Minutes of phone use before sleep | Hours of sleep per night |
Temperature and soil respiration | Air temperature just above the soil surface | CO2 respired from soil |
Then you need to think about possible extraneous and confounding variables and consider how you might control them in your experiment.
Extraneous variable | How to control | |
---|---|---|
Phone use and sleep | in sleep patterns among individuals. | measure the average difference between sleep with phone use and sleep without phone use rather than the average amount of sleep per treatment group. |
Temperature and soil respiration | also affects respiration, and moisture can decrease with increasing temperature. | monitor soil moisture and add water to make sure that soil moisture is consistent across all treatment plots. |
Finally, you can put these variables together into a diagram. Use arrows to show the possible relationships between variables and include signs to show the expected direction of the relationships.
Here we predict that increasing temperature will increase soil respiration and decrease soil moisture, while decreasing soil moisture will lead to decreased soil respiration.
Professional editors proofread and edit your paper by focusing on:
See an example
Now that you have a strong conceptual understanding of the system you are studying, you should be able to write a specific, testable hypothesis that addresses your research question.
Null hypothesis (H ) | Alternate hypothesis (H ) | |
---|---|---|
Phone use and sleep | Phone use before sleep does not correlate with the amount of sleep a person gets. | Increasing phone use before sleep leads to a decrease in sleep. |
Temperature and soil respiration | Air temperature does not correlate with soil respiration. | Increased air temperature leads to increased soil respiration. |
The next steps will describe how to design a controlled experiment . In a controlled experiment, you must be able to:
If your study system doesn’t match these criteria, there are other types of research you can use to answer your research question.
How you manipulate the independent variable can affect the experiment’s external validity – that is, the extent to which the results can be generalized and applied to the broader world.
First, you may need to decide how widely to vary your independent variable.
Second, you may need to choose how finely to vary your independent variable. Sometimes this choice is made for you by your experimental system, but often you will need to decide, and this will affect how much you can infer from your results.
How you apply your experimental treatments to your test subjects is crucial for obtaining valid and reliable results.
First, you need to consider the study size : how many individuals will be included in the experiment? In general, the more subjects you include, the greater your experiment’s statistical power , which determines how much confidence you can have in your results.
Then you need to randomly assign your subjects to treatment groups . Each group receives a different level of the treatment (e.g. no phone use, low phone use, high phone use).
You should also include a control group , which receives no treatment. The control group tells us what would have happened to your test subjects without any experimental intervention.
When assigning your subjects to groups, there are two main choices you need to make:
An experiment can be completely randomized or randomized within blocks (aka strata):
Completely randomized design | Randomized block design | |
---|---|---|
Phone use and sleep | Subjects are all randomly assigned a level of phone use using a random number generator. | Subjects are first grouped by age, and then phone use treatments are randomly assigned within these groups. |
Temperature and soil respiration | Warming treatments are assigned to soil plots at random by using a number generator to generate map coordinates within the study area. | Soils are first grouped by average rainfall, and then treatment plots are randomly assigned within these groups. |
Sometimes randomization isn’t practical or ethical , so researchers create partially-random or even non-random designs. An experimental design where treatments aren’t randomly assigned is called a quasi-experimental design .
In a between-subjects design (also known as an independent measures design or classic ANOVA design), individuals receive only one of the possible levels of an experimental treatment.
In medical or social research, you might also use matched pairs within your between-subjects design to make sure that each treatment group contains the same variety of test subjects in the same proportions.
In a within-subjects design (also known as a repeated measures design), every individual receives each of the experimental treatments consecutively, and their responses to each treatment are measured.
Within-subjects or repeated measures can also refer to an experimental design where an effect emerges over time, and individual responses are measured over time in order to measure this effect as it emerges.
Counterbalancing (randomizing or reversing the order of treatments among subjects) is often used in within-subjects designs to ensure that the order of treatment application doesn’t influence the results of the experiment.
Between-subjects (independent measures) design | Within-subjects (repeated measures) design | |
---|---|---|
Phone use and sleep | Subjects are randomly assigned a level of phone use (none, low, or high) and follow that level of phone use throughout the experiment. | Subjects are assigned consecutively to zero, low, and high levels of phone use throughout the experiment, and the order in which they follow these treatments is randomized. |
Temperature and soil respiration | Warming treatments are assigned to soil plots at random and the soils are kept at this temperature throughout the experiment. | Every plot receives each warming treatment (1, 3, 5, 8, and 10C above ambient temperatures) consecutively over the course of the experiment, and the order in which they receive these treatments is randomized. |
Finally, you need to decide how you’ll collect data on your dependent variable outcomes. You should aim for reliable and valid measurements that minimize research bias or error.
Some variables, like temperature, can be objectively measured with scientific instruments. Others may need to be operationalized to turn them into measurable observations.
How precisely you measure your dependent variable also affects the kinds of statistical analysis you can use on your data.
Experiments are always context-dependent, and a good experimental design will take into account all of the unique considerations of your study system to produce information that is both valid and relevant to your research question.
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Research bias
Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:
When designing the experiment, you decide:
Experimental design is essential to the internal and external validity of your experiment.
The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .
A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.
A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.
In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.
In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.
In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.
The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.
An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bevans, R. (2023, June 21). Guide to Experimental Design | Overview, 5 steps & Examples. Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/methodology/experimental-design/
Other students also liked, random assignment in experiments | introduction & examples, quasi-experimental design | definition, types & examples, how to write a lab report, "i thought ai proofreading was useless but..".
I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”
Saul McLeod, PhD
Editor-in-Chief for Simply Psychology
BSc (Hons) Psychology, MRes, PhD, University of Manchester
Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.
Learn about our Editorial Process
Olivia Guy-Evans, MSc
Associate Editor for Simply Psychology
BSc (Hons) Psychology, MSc Psychology of Education
Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.
On This Page:
The experimental method involves the manipulation of variables to establish cause-and-effect relationships. The key features are controlled methods and the random allocation of participants into controlled and experimental groups .
An experiment is an investigation in which a hypothesis is scientifically tested. An independent variable (the cause) is manipulated in an experiment, and the dependent variable (the effect) is measured; any extraneous variables are controlled.
An advantage is that experiments should be objective. The researcher’s views and opinions should not affect a study’s results. This is good as it makes the data more valid and less biased.
There are three types of experiments you need to know:
A laboratory experiment in psychology is a research method in which the experimenter manipulates one or more independent variables and measures the effects on the dependent variable under controlled conditions.
A laboratory experiment is conducted under highly controlled conditions (not necessarily a laboratory) where accurate measurements are possible.
The researcher uses a standardized procedure to determine where the experiment will take place, at what time, with which participants, and in what circumstances.
Participants are randomly allocated to each independent variable group.
Examples are Milgram’s experiment on obedience and Loftus and Palmer’s car crash study .
A field experiment is a research method in psychology that takes place in a natural, real-world setting. It is similar to a laboratory experiment in that the experimenter manipulates one or more independent variables and measures the effects on the dependent variable.
However, in a field experiment, the participants are unaware they are being studied, and the experimenter has less control over the extraneous variables .
Field experiments are often used to study social phenomena, such as altruism, obedience, and persuasion. They are also used to test the effectiveness of interventions in real-world settings, such as educational programs and public health campaigns.
An example is Holfing’s hospital study on obedience .
A natural experiment in psychology is a research method in which the experimenter observes the effects of a naturally occurring event or situation on the dependent variable without manipulating any variables.
Natural experiments are conducted in the day (i.e., real life) environment of the participants, but here, the experimenter has no control over the independent variable as it occurs naturally in real life.
Natural experiments are often used to study psychological phenomena that would be difficult or unethical to study in a laboratory setting, such as the effects of natural disasters, policy changes, or social movements.
For example, Hodges and Tizard’s attachment research (1989) compared the long-term development of children who have been adopted, fostered, or returned to their mothers with a control group of children who had spent all their lives in their biological families.
Here is a fictional example of a natural experiment in psychology:
Researchers might compare academic achievement rates among students born before and after a major policy change that increased funding for education.
In this case, the independent variable is the timing of the policy change, and the dependent variable is academic achievement. The researchers would not be able to manipulate the independent variable, but they could observe its effects on the dependent variable.
Ecological validity.
The degree to which an investigation represents real-life experiences.
These are the ways that the experimenter can accidentally influence the participant through their appearance or behavior.
The clues in an experiment lead the participants to think they know what the researcher is looking for (e.g., the experimenter’s body language).
The variable the experimenter manipulates (i.e., changes) is assumed to have a direct effect on the dependent variable.
Variable the experimenter measures. This is the outcome (i.e., the result) of a study.
All variables which are not independent variables but could affect the results (DV) of the experiment. EVs should be controlled where possible.
Variable(s) that have affected the results (DV), apart from the IV. A confounding variable could be an extraneous variable that has not been controlled.
Randomly allocating participants to independent variable conditions means that all participants should have an equal chance of participating in each condition.
The principle of random allocation is to avoid bias in how the experiment is carried out and limit the effects of participant variables.
Changes in participants’ performance due to their repeating the same or similar test more than once. Examples of order effects include:
(i) practice effect: an improvement in performance on a task due to repetition, for example, because of familiarity with the task;
(ii) fatigue effect: a decrease in performance of a task due to repetition, for example, because of boredom or tiredness.
Articulated object manipulation is ubiquitous in daily life. In this paper, we present DexSim2Real 2 2 {}^{\textbf{2}} start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT , a novel robot learning framework for goal-conditioned articulated object manipulation using both two-finger grippers and multi-finger dexterous hands. The key of our framework is constructing an explicit world model of unseen articulated objects through active one-step interactions. This explicit world model enables sampling-based model predictive control to plan trajectories achieving different manipulation goals without needing human demonstrations or reinforcement learning. It first predicts an interaction motion using an affordance estimation network trained on self-supervised interaction data or videos of human manipulation from the internet. After executing this interaction on the real robot, the framework constructs a digital twin of the articulated object in simulation based on the two point clouds before and after the interaction. For dexterous multi-finger manipulation, we propose to utilize eigengrasp to reduce the high-dimensional action space, enabling more efficient trajectory searching. Extensive experiments validate the framework’s effectiveness for precise articulated object manipulation in both simulation and the real world using a two-finger gripper and a 16-DoF dexterous hand. The robust generalizability of the explicit world model also enables advanced manipulation strategies, such as manipulating with different tools.
I introduction.
Articulated object manipulation is a fundamental and challenging problem in robotics. Compared with pick-and-place tasks, where only the start and final poses of robot end effectors are constrained, articulated object manipulation requires the robot end effector to move along certain trajectories, making the problem significantly more complex. Most existing works utilize a neural network to learn the correlation between object states and correct actions, and employ reinforcement learning (RL) and imitation learning (IL) to train the neural network [ 1 , 2 , 3 ] . However, since the state distribution of articulated objects is higher-dimensional and more complex than that of rigid objects, it is difficult for the neural network to learn such correlation, even with hundreds of successful demonstrations and millions of interactions [ 4 , 5 ] .
For humans, manipulation involves not only action responding to perception, as is the case with policy networks, but also motor imagery and mental simulation, that humans can imagine the action consequences before execution and plan the action trajectory accordingly [ 6 ] . To model the world more accurately, humans can actively interacting with the environment, changing its states and gathering additional information, which is named as interactive perception [ 7 , 8 ] .
In this paper, we propose a robot learning framework called DexSim2Real 2 2 {}^{\textbf{2}} start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT to achieve precise goal-conditioned manipulation of articulated objects using two-finger grippers and multi-finger dexterous hands, where we use a physics simulator as the mental model of robots. Fig. 1 provides a brief overview of our framework. Given a single-view RGBD image of the articulated object at its initial state as input, the framework first learns an affordance estimation network from self-supervised interaction in simulation or egocentric videos of human-object interactions. The network predicts a one-step motion of the robot end effector. The reason why we first learn the affordance is that affordance estimation is only attributed to the object and can better generalize to novel objects. Also the one-step interaction does not require fine manipulation of the dexterous hand. Next, we execute the predicted action on the real robot to change the object’s state and capture another RGBD image after the interaction. Then, we train a module to construct an explicit world model of the articulated object. We transform the two RGBD images into two point clouds and generate a digital twin of the articulated object in simulation. Finally, using the explicit world model we have built, we utilize sampling-based model predictive control (MPC) to plan a trajectory to achieve goal-conditioned manipulation tasks.
While dexterous manipulation with multi-finger hands enables more flexible, efficient and robust manipulation, the high-dimensional action space presents significant challenges for MPC. To handle this problem, we propose to employ eigengrasp [ 9 ] to reduce the operational dimensions of the dexterous hand, enabling more efficient and successful searching. While eigengrasp has been widely studied for robot grasping [ 10 , 11 , 12 ] , its application in dexterous manipulation remains under-explored. Since our method constructs an explicit world model of the articulated object, we can accurately predict its motion upon contact with the dexterous hand. This allows us to search for a feasible dexterous manipulation trajectory.
This article is an extension of our previous ICRA work: Sim2Real 2 2 {}^{\textbf{2}} start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT [ 13 ] . There are two main additional contributions in this work:
(1) We broaden the framework’s scope from manipulation with two-finger gripper to multi-finger dexterous manipulation. To address the challenge introduced by the high-dimensional action space of the dexterous hand, we propose to utilize eigengrasp to reduce the dimension, leading to more efficient and successful manipulation. We conduct extensive experiments both in simulation and on a real robot to validate our method’s effectiveness for dexterous manipulation and the usefulness of its different modules.
(2) In our previous work, we use self-supervised interaction in simulation to generate training data for affordance estimation, which requires interactable 3D assets which is still inadequate currently. To eliminate such dependency and enhance our framework’s scalability, we propose to learn the affordance from egocentric human manipulation videos, which are large-scale and freely accessible. However, since trajectories in videos are in 2D pixel space, we propose a spatial projection method to generate 3D robot motions from 2D trajectory predictions.
The remainder of this paper is structured as follows: Related works are reviewed in Section II . Our proposed robot learning framework is detailed in Section III . Experimental setup and results are presented in Section IV . Finally, conclusions, limitations, and future directions are discussed in Section V .
Ii-a dexterous manipulation.
Compared with two-finger grippers, multi-finger dexterous hands can manipulate a broader range of objects with more human-like dexterous actions [ 14 ] . Traditional model-based approaches formulate dexterous manipulation as a planning problem and generate trajectories through search and optimization [ 15 , 16 , 17 , 18 ] . These methods require accurate 3D shapes of the manipulated object and the hand, which limits their applicability to unseen objects.
In contrast, data-driven methods learn manipulation policies through imitation learning and reinforcement learning [ 19 , 20 , 21 , 22 , 23 , 24 ] . In [ 21 ] , a single-camera teleoperation system is developed for 3D demonstration trajectory collection, significantly reducing the equipment cost. Nevertheless, the time consuming nature of human demonstration and the space required for scene setup still limits the scalability of imitation learning. RL eliminates the need for demonstrations and leads to better scalability. Most existing RL methods learn a policy, which directly maps the observation into the joint angles of the dexterous hand [ 23 , 24 , 22 ] . However, the high-dimensional action space slows the learning efficiency and usually results in uncommon hand motion which cannot be executed on real robot hands. In [ 12 ] , eigengrasps [ 9 ] are used to reduce the dimension of the action space for functional grasping. Experimental results show that the utilization of eigengrasp can lead to more stable and physically realistic hand motion for robot grasping. However, more advanced manipulation policies are not studied in this work.
In our work, we combine the advantages of model-based methods and data-driven methods by first learning a generalizable world model construction module and then using the model to search for a feasible trajectory for dexterous manipulation. Furthermore, we adopt eigengrasps to accelerate the searching process and generate more reasonable hand motions that can be directly executed on real robots.
Building an accurate and generalizable transition model of the environment capable of reacting to agent interactions has been a long-standing problem in optimal control and model-based RL [ 25 , 26 ] . Some existing methods model the dynamic system in a lower-dimensional state space, reducing computation and simplifying the transition model [ 27 , 28 , 29 ] . However, this approach discards the environment’s spatial structure, which limits the model’s generalizablity to novel interactions.
With increasing computational power and large network architectures, image-based and video-based world models have gain increasing attention [ 30 , 31 , 32 , 33 ] . In [ 33 ] , a U-Net-based video diffusion model is used to predict future observation video sequence from the past observations and actions. While it shows great ability to emulate real-world manipulation and navigation environments, it requires an extremely large-scale dataset and computational resources for network training, because the network contains minimal knowledge prior of the environment. Additionally, the inference speed of the large network limits its feasibility for MPC.
In our work, we focus on articulated object manipulation, so we introduce the knowledge prior of the environment by using an explicit physics model. Therefore, we are able to decrease the number of samples required for model construction to 1. Moreover, the explicit physics model’s generalizability guarantees that while we only use a simple action to collect the sample, the built model can be used for long-horizon complex trajectory planning composed of unseen robot actions.
In the context of articulated objects, affordances dictate how their movable parts can be interacted by a robot to achieve a desired configuration, which provides a valuable guide for articulated object manipulation. Therefore, affordance learning has been widely studied in the literature. Deng et al. built a benchmark for visual object affordance understanding by manual annotation [ 34 ] . Cui et al. explored learning affordances using point supervision [ 35 ] . While these supervised learning methods can yield accurate affordance predictions, the cost of the manual annotation process limits their scalability.
Another line of research focuses on learning the affordances through interactions in simulation [ 36 , 1 , 3 ] . Where2act [ 36 ] first collects random offline interaction trajectories and then samples online interaction data points for training data generation to facilitate affordance learning. However, the key bottleneck of simulation-based methods is the requirement for 3D articulated object assets that can be accurately interacted with and simulated. Unfortunately, most existing 3D object datasets only include static CAD models, which cannot be used for physics simulation [ 37 , 38 ] .
Videos of human-object interactions are free, large-scale, and diverse, making them an ideal data source for robot learning [ 39 , 40 , 41 ] . In VRB [ 40 ] , the contact point and post-contact trajectory are first extracted from videos of human manipulation, and then they are used to supervise the training of the affordance model. However, the predicted affordance is only 2D coordinate and direction in the image, which cannot be directly used for robot execution. Therefore, we propose to generate the robot interaction direction in the 3D physical space by synthesizing a virtual image from the RGBD data and computing the 3D robot motion as the intersection of the 2 VRB predictions in the 3D space.
Physics simulation plays a pivotal role in manipulation policy learning, offering large-scale parallelism, reduced training costs, and avoidance of potential damage to robots and researchers [ 42 , 43 , 44 , 45 ] . Most existing methods utilize RL for policy learning in simulation and then deploy the learned policy on a real robot [ 46 , 47 , 48 ] . DexPoint [ 23 ] utilizes the concatenation of observed point clouds and imagined hand point cloud as inputs and learns dexterous manipulation policy. However, since the neural network does not contain any prior knowledge of the environment, a large amount of interaction data is required to improve its accuracy and generalizablity. In contrast, we propose to first build the explicit world model of the target object and employ MPC to generate manipulation trajectories based on the built model of the single object instance. By avoiding the diversity of objects, we substantially reduce the required interactions and improve the manipulation accuracy in the real world.
The goal of our work is to manipulate articulated objects to specified joint states with various robot-effectors in the real world, including two-finger grippers and multi-finger dexterous hands. To better align with actual application scenarios, we employ a single depth sensor to acquire a partial point cloud of the object as the observation. Fig. 2 shows an overview of our framework. It consists of three modules: Interactive Perception (Section III-A ), Explicit World Model Construction (Section III-B ), Sampling-based Model Predictive Control (Section III-C ).
A single observation of an articulated object cannot provide enough information to reveal its full structure. For example, when humans first look at a kitchen door, it is hard to tell whether it has a rotating hinge or a sliding hinge. However, after the door is moved, humans can use the information from the two observations to infer the type and location of the hinge. Inspired by this, the Interactive Perception module proposes an action to alter the joint state of the articulated object based on learned affordance. This action is then executed on the object in the real world, resulting in two frames of point clouds: one before the interaction and one after.
With the two point clouds, the Explicit World Model Construction module (Section III-B ) infers the shape and the kinematic structure of the articulated object to construct a digital model. The digital model can be loaded into a physics simulator for the robot to interact with, forming an explicit world model of the environment.
The constructed world model can be used to search for a trajectory of control inputs that change the state of the articulated object from s initial subscript 𝑠 initial s_{\text{initial}} italic_s start_POSTSUBSCRIPT initial end_POSTSUBSCRIPT to a target state s target subscript 𝑠 target s_{\text{target}} italic_s start_POSTSUBSCRIPT target end_POSTSUBSCRIPT using Sampling-based Model Predictive Control , introduced in Section III-C . With the model of a specific object, we can efficiently plan a trajectory using sampling-based MPC to manipulate the object precisely, rather than learning a generalizable policy.
At the beginning, the articulated object is placed statically within the scene, and the robot has only a single-frame observation of it. Understanding the articulation structure and surface geometry of each part of the object from this limited view is challenging. However, by actively interacting with the object and altering its state, additional information can be gathered to enhance the understanding of its structure. It is worth noting that the interaction in this step does not require precision.
To achieve this goal, it is essential to learn to predict the affordance based on the initial single-frame observation. In our work, we first learn the affordance through self-supervised interaction in simulation. However, simulation requires interactable 3D assets, which are still relatively scarce. Therefore, we further study learning affordances from real-world human manipulation videos, which are readily available and large-scale.
By extensively interacting with articulated objects in the simulation, actions that change the state of the articulated object to some extent can be automatically labeled as successful. Using these automatically labeled observation-action pairs, neural networks can be trained to predict candidate actions that can change the object’s state based on the initial observation of the object.
For affordance learning in this method, we use Where2Act [ 36 ] . This algorithm includes an Actionability Scoring Module, which predicts an actionability score a p subscript 𝑎 𝑝 a_{p} italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT for all points. A higher a p subscript 𝑎 𝑝 a_{p} italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT indicates a higher likelihood that an action executed at that point will move the part. Additionally, the Action Proposal Module suggests actions for a specific point. The Action Scoring Module then predicts the success likelihood of these proposed actions.
In Where2Act, only a flying gripper is considered, and primitive actions are parameterized by the gripper pose in S E ( 3 ) 𝑆 𝐸 3 SE(3) italic_S italic_E ( 3 ) space. This approach does not account for the robot’s kinematic structure, increasing the difficulty of execution in the real world due to potential motion planning failures. Although this simplification eases the learning process, it complicates real-world execution, as motion planning may not find feasible solutions for the proposed actions.
To address this problem, we select n p subscript 𝑛 𝑝 n_{p} italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT points with the highest actionability scores as candidate points. For each candidate point, we choose n a subscript 𝑛 𝑎 n_{a} italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT actions with the highest success likelihood scores from the proposed actions. We then use motion planning to attempt to generate joint trajectories for these actions sequentially until a successful one is found. Empirically, we find that this method improves the success rate for the motion planner because the action with the highest success likelihood is often outside the robot’s dexterous workspace.
Acquiring 3D affordance representations through self-supervised interactions in simulation has shown promise as it doesn’t rely on labeled data. However, certain limitation exists: the success of this method hinges on interactive models in simulation. Unfortunately, the availability of simulated datasets for articulated objects is limited, hindering the generation of training data.
To address this limitation, we propose another approach that leverages real-world egocentric videos of humans interacting with objects. This complementary data source allows us to overcome the limitations of simulation-based learning and broaden the scope of our affordance representation system. Specifically, we utilize the Vision-Robotics Bridge (VRB) [ 40 ] to predict the affordance of articulated objects. VRB introduces an innovative affordance model that learns from human videos. It extracts the contact region and post-contact wrist trajectory of the video. These cues serve as supervision signals for training the affordance model. Given an RGB image of an object as input, the VRB model generates two key outputs: a contact heatmap, highlighting the regions where contact occurs, and a 2D vector representation of the post-contact trajectory within the image. Both of these two outputs are within 2D space. However, for effective interaction between robots and objects in the real world, a 3D manipulation strategy is necessary. To address this issue, we need to convert the 2D affordance generated by the model into valid 3D spatial vector and contact region.
Fig. 3 illustrates how we generate a 3D trajectory for real robot manipulation from 2D affordances. Firstly, we capture an RGB image 𝑰 0 subscript 𝑰 0 \boldsymbol{I}_{0} bold_italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a 3D point cloud 𝓟 0 subscript 𝓟 0 \bm{\mathcal{P}}_{0} bold_caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT using the mounted RGBD camera, 𝑯 0 ∈ ℝ 4 × 4 subscript 𝑯 0 superscript ℝ 4 4 \boldsymbol{H}_{0}\in\mathbb{R}^{4\times 4} bold_italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 × 4 end_POSTSUPERSCRIPT is its relative transformation matrix respect to the robot coordinate system. Secondly, we set a virtual camera, the relative transformation matrix of which is 𝑯 ∈ ℝ 4 × 4 𝑯 superscript ℝ 4 4 \boldsymbol{H}\in\mathbb{R}^{4\times 4} bold_italic_H ∈ blackboard_R start_POSTSUPERSCRIPT 4 × 4 end_POSTSUPERSCRIPT . Since the depth of each pixel in 𝑰 0 subscript 𝑰 0 \boldsymbol{I}_{0} bold_italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is known, we can generate the virtual RGB image 𝑰 1 subscript 𝑰 1 \boldsymbol{I}_{1} bold_italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by image wrapping. Thirdly, we use 𝑰 0 subscript 𝑰 0 \boldsymbol{I}_{0} bold_italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝑰 1 subscript 𝑰 1 \boldsymbol{I}_{1} bold_italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as the input of the affordance model and generate contact points 𝒄 0 = ( u 0 , v 0 ) subscript 𝒄 0 subscript 𝑢 0 subscript 𝑣 0 \boldsymbol{c}_{0}=(u_{0},v_{0}) bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , 𝒄 1 = ( u 1 , v 1 ) subscript 𝒄 1 subscript 𝑢 1 subscript 𝑣 1 \boldsymbol{c}_{1}=(u_{1},v_{1}) bold_italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and post-contact trajectories 𝝉 0 = ( u 0 ′ − u 0 , v 0 ′ − v 0 ) subscript 𝝉 0 superscript subscript 𝑢 0 ′ subscript 𝑢 0 superscript subscript 𝑣 0 ′ subscript 𝑣 0 \boldsymbol{\tau}_{0}=(u_{0}^{{}^{\prime}}-u_{0},v_{0}^{{}^{\prime}}-v_{0}) bold_italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT - italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT - italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and 𝝉 1 = ( u 1 ′ − u 1 , v 1 ′ − v 1 ) subscript 𝝉 1 superscript subscript 𝑢 1 ′ subscript 𝑢 1 superscript subscript 𝑣 1 ′ subscript 𝑣 1 \boldsymbol{\tau}_{1}=(u_{1}^{{}^{\prime}}-u_{1},v_{1}^{{}^{\prime}}-v_{1}) bold_italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT - italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT - italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) . The camera intrinsic matrix is 𝑲 𝑲 \boldsymbol{K} bold_italic_K , the contact point in the mounted camera frame is 𝒑 c ∈ ℝ 3 subscript 𝒑 𝑐 superscript ℝ 3 \boldsymbol{p}_{c}\in\mathbb{R}^{3} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , the 3D post contact vector in the camera frame is 𝝉 c ∈ ℝ 3 subscript 𝝉 𝑐 superscript ℝ 3 \boldsymbol{\tau}_{c}\in\mathbb{R}^{3} bold_italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT . Fourthly, we respectively calculate the 3D contact point and post contact vector. We use contact point 𝒄 0 subscript 𝒄 0 \boldsymbol{c}_{0} bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to acquire 3D contact point 𝒑 c ∈ ℝ 3 subscript 𝒑 𝑐 superscript ℝ 3 \boldsymbol{p}_{c}\in\mathbb{R}^{3} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT in the robot base frame:
(1) |
where z c subscript 𝑧 𝑐 z_{c} italic_z start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represents the depth of 𝒑 c subscript 𝒑 𝑐 \boldsymbol{p}_{c} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT . We use camera’s intrinsic matrix to transfer 𝒄 0 subscript 𝒄 0 \boldsymbol{c}_{0} bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to point in mounted camera frame than use mounted camera’s extrinsic matrix to transfer it to the 3D point cloud 𝒑 c subscript 𝒑 𝑐 \boldsymbol{p}_{c} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT .
However, generating 3D post-contact vector from 2D information can be comparatively difficult. we can regard the 2D post contact vectors as the projection of 3D vector on their image planes. For each 2D vetcor, there exists countless 3D vectors whose projection on the image plane is the same as the 2D vector. These vectors are all distributed in a same “projection plane”. Given that two different 2D vectors have been generated, we can use the intersection lines of two planes to represent the 3D post contact vector.
Specifically, our method of calculating 3D post contact vector is shown in Fig. 3 . We respectively denote the projection plane of 𝑰 0 subscript 𝑰 0 \boldsymbol{I}_{0} bold_italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝑰 1 subscript 𝑰 1 \boldsymbol{I}_{1} bold_italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as 𝑺 0 subscript 𝑺 0 \boldsymbol{S}_{0} bold_italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝑺 1 subscript 𝑺 1 \boldsymbol{S}_{1} bold_italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . For 𝑺 0 subscript 𝑺 0 \boldsymbol{S}_{0} bold_italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , we use 𝝋 0 subscript 𝝋 0 \boldsymbol{\varphi}_{0} bold_italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝝋 0 ′ superscript subscript 𝝋 0 ′ \boldsymbol{\varphi}_{0}^{{}^{\prime}} bold_italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT to represent the projection plane. 𝝋 0 subscript 𝝋 0 \boldsymbol{\varphi}_{0} bold_italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT represents one possible 3D vector on projection plane 𝑺 0 subscript 𝑺 0 \boldsymbol{S}_{0} bold_italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . Its starting point is 𝒑 c subscript 𝒑 𝑐 \boldsymbol{p}_{c} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , while its ending point can be calculated with:
(2) |
It is worth noticing that within the camera frame, 𝒑 c ′ superscript subscript 𝒑 𝑐 ′ \boldsymbol{p}_{c}^{{}^{\prime}} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT and 𝒑 c subscript 𝒑 𝑐 \boldsymbol{p}_{c} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT share the same depth. 𝝋 0 ′ superscript subscript 𝝋 0 ′ \boldsymbol{\varphi}_{0}^{{}^{\prime}} bold_italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT starts from the origin of the camera frame and ends at 𝒑 c subscript 𝒑 𝑐 \boldsymbol{p}_{c} bold_italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT :
(3) |
(4) |
where 𝒐 c 0 subscript 𝒐 𝑐 0 \boldsymbol{o}_{c0} bold_italic_o start_POSTSUBSCRIPT italic_c 0 end_POSTSUBSCRIPT is the coordinate of camera frame’s origin in the robot base frame. Then we calculate the norm vector of 𝑺 0 subscript 𝑺 0 \boldsymbol{S}_{0} bold_italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : 𝒏 0 subscript 𝒏 0 \boldsymbol{n}_{0} bold_italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 𝝋 0 × 𝝋 0 ′ subscript 𝝋 0 superscript subscript 𝝋 0 ′ \boldsymbol{\varphi}_{0}\times\boldsymbol{\varphi}_{0}^{{}^{\prime}} bold_italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT × bold_italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT . We can calculate 𝒏 1 subscript 𝒏 1 \boldsymbol{n}_{1} bold_italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the same way. Finally, we generate the 3D post-contact vector in the robot base frame: 𝝉 c subscript 𝝉 𝑐 \boldsymbol{\tau}_{c} bold_italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 𝒏 0 × 𝒏 1 subscript 𝒏 0 subscript 𝒏 1 \boldsymbol{n}_{0}\times\boldsymbol{n}_{1} bold_italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT × bold_italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .
Finally, we use motion planning to conduct the one-step interaction with the articulated object. The motion planning process can be divided into two phases: we first let the hand move to the contact point and then we let the hand move a little distance in the direction of the post contact vector.
Building an explicit model of an articulated object is difficult because only if the geometries of all parts and kinematic relationships between connected parts are both figured out can the model of the articulated object be constructed.
In our work, we have two assumptions for the articulated objects: (1) the articulated object only contains a single prismatic or revolute joint; (2) the base link of the articulated object is fixed.
We choose Ditto [ 49 ] to construct the physical model explicitly. Given the visual observations before and after the interaction ( 𝓟 0 subscript 𝓟 0 \bm{\mathcal{P}}_{0} bold_caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝓟 1 subscript 𝓟 1 \bm{\mathcal{P}}_{1} bold_caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), Ditto uses structured feature grids and unified implicit neural representation to construct part-level digital twins of articulated objects. Different from the original work where a multi-view fused point cloud is used, we use a single-view point cloud as input, which is more consistent with real robot application settings. Furthermore, we simulate the depth sensor’s noise when generating training data to narrow the domain gap [ 50 ] . After we train the Ditto on simulated data, we use the trained model on the real two-frame point clouds to generate the implicit neural representation and extract the meshes. The explicit physics model is represented as the Unified Robot Description Format (URDF), which can be easily loaded into widely used multi-body physics simulators, such as SAPIEN [ 51 ] .
The surface geometries of the real-world object are usually complex, thus the extracted meshes can be non-convex. We further perform convex decomposition using VHACD [ 52 ] before importing the meshes to the physics simulator, which is essential for realistic physics simulation of robot interaction.
Having an explicit physics model and a target joint state s t a r g e t subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 s_{target} italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT of the articulated object, the agent needs to search for a trajectory that can change the current joint state s i n i t i a l = s 1 subscript 𝑠 𝑖 𝑛 𝑖 𝑡 𝑖 𝑎 𝑙 subscript 𝑠 1 s_{initial}=s_{1} italic_s start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t italic_i italic_a italic_l end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to s t a r g e t subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 s_{target} italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT . The expected relative joint movement is Δ s t a r g e t = s t a r g e t − s i n i t i a l Δ subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 subscript 𝑠 𝑖 𝑛 𝑖 𝑡 𝑖 𝑎 𝑙 \Delta s_{target}=s_{target}-s_{initial} roman_Δ italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t italic_i italic_a italic_l end_POSTSUBSCRIPT . Because of the complex contact between the robot end-effector and the constructed object model, the informative gradient of the objective function can hardly be acquired. Therefore, we employ sampling-based model predictive control, which is a zeroth-order method, to search for an optimal trajectory. There are various kinds of sampling-based model predictive control algorithms according to the zeroth-order optimization method used, such as Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [ 53 ] , Cross-Entropy Method (CEM) [ 54 ] , and Model Predictive Path Integral Control (MPPI) [ 55 ] . Among these methods, we select the iCEM method [ 56 ] to search for a feasible long-horizon trajectory to complete the task due to its simplicity and effectiveness. We briefly describe how we apply the iCEM method in the following paragraph.
T\in\mathbb{N}^{+} italic_T ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT denotes the maximum time steps in a trajectory. At each time step t ( t < T ) 𝑡 𝑡 𝑇 t(t<T) italic_t ( italic_t < italic_T ) , the action of the robot 𝒂 t ∈ ℝ d subscript 𝒂 𝑡 superscript ℝ 𝑑 \bm{a}_{t}\in\mathbb{R}^{d} bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the incremental value of the joint position, where d 𝑑 d italic_d is the number of degrees of freedom (DOF) of the robot. The population N 𝑁 N italic_N denotes the number of samples sampled in each CEM iteration. Planning horizon h ℎ h italic_h determines the number of time steps the robot plans in the future at each time step. The top K 𝐾 K italic_K samples according to rewards compose an elite set, which is used to fit means and variances of a new Gaussian distribution. Please refer to [ 56 ] for details of the algorithm.
At each time step t 𝑡 t italic_t , the agent generates an action for the robot 𝒂 t ∈ ℝ d subscript 𝒂 𝑡 superscript ℝ 𝑑 \bm{a}_{t}\in\mathbb{R}^{d} bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , where d 𝑑 d italic_d is the dimension of the action space. For 2-finger gripper tasks, d = 8 𝑑 8 d=8 italic_d = 8 , which consists of the 7 DOF of the robot arm and the 1 DOF of the gripper. However, for dexterous hands, d = 23 𝑑 23 d=23 italic_d = 23 , which includes the 7 DOF of the robot arm and the 16 DOF for the hand. The computational cost of iCEM is multiplied due to the high dimensionality of the action space. Consequently, directly searching in the original joint space of the multi-finger dexterous hand is not feasible. Moreover, the high-dimensional space of the dexterous hand may lead to unnatural postures. Therefore, it becomes essential to reduce the action space within the iCEM algorithm when using the dexterous hands.
7 𝑚 \bm{a}_{t}\in\mathbb{R}^{7+m} bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 7 + italic_m end_POSTSUPERSCRIPT . The joint angles of the hand 𝒒 h subscript 𝒒 ℎ \bm{q}_{h} bold_italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT are computed as a linear combination of the m 𝑚 m italic_m eigenvectors:
(5) |
To speed up the search process, we use dense rewards to guide the trajectory optimization:
For the two-finger gripper, the reward function consists of the following terms:
(1) success reward
where s t subscript 𝑠 𝑡 s_{t} italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes the joint state at current time step t 𝑡 t italic_t , and ϵ italic-ϵ \epsilon italic_ϵ is a predefined threshold.
(2) approaching reward
This reward encourages s t subscript 𝑠 𝑡 s_{t} italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to converge to s t a r g e t subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 s_{target} italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT .
(3) contact reward
This reward encourages the robot to have first contact with the object in the correct direction and to keep in contact with the object when moving the part. Also, this reward tries to prevent parts other than the fingertip or the target part of the object from colliding.
(4) distance reward
(5) regularization reward
This reward is a regularization reward that discourages the robot to move too fast or move to an unreasonable configuration. a i subscript 𝑎 𝑖 a_{i} italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and v i subscript 𝑣 𝑖 v_{i} italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the acceleration and velocity of the i 𝑖 i italic_i th joint respectively.
For the dexterous hand, apart from the success reward r s u c c e s s subscript 𝑟 𝑠 𝑢 𝑐 𝑐 𝑒 𝑠 𝑠 r_{success} italic_r start_POSTSUBSCRIPT italic_s italic_u italic_c italic_c italic_e italic_s italic_s end_POSTSUBSCRIPT and approaching reward r t a r g e t subscript 𝑟 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 r_{target} italic_r start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT , which remain consistent in the 2-finger gripper’s reward function, the other three terms are as follows:
(1) contact reward
This reward function encourages the dexterous hand to cage much of the target link while searching for the trajectory. With this reward, the dexterous hand can quickly find a stable grasping position of the target link and keep in contact with the object while moving the part.
(2) distance reward
(3) regularization reward
This reward discourages the robot to move too fast by restricting the joints’ velocity. The reward also discourages position error of the end link using cartesian error. v i subscript 𝑣 𝑖 v_{i} italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the velocity of the i 𝑖 i italic_i -th joint respectively and e p subscript 𝑒 𝑝 e_{p} italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT denotes the cartesian error of the end effector.
Once the manipulation trajectory is generated, we execute the trajectory on the real robot.
In this section, we evaluate the precision and effectiveness of the proposed method for manipulating articulated objects for both two-finger grippers and dexterous hands. We first conduct a large number of real-world articulated object manipulation experiments and quantitatively compare the performance. Then we design 4 ablation studies to verify the effectiveness of different modules of our method. Finally, we validate the operational advantage of the dexterous hand against the two-finger gripper by comparing the task execution efficiency in simulation.
Fig. 4 shows the real-world experimental setup. For the robot, a 7-DOF robot arm (ROKAE xMate3Pro) is used and an RGBD camera (Intel RealSense D415) is set to capture the visual input. The robot arm base is fixed at the table. Two kinds of end effectors are used: a 1-DoF 2-finger gripper (Robotiq 2F-140) and a 16-DoF 4-finger dexterous hand (Allegro Hand).
We choose 3 categories of common articulated objects for experiments, which are drawers, faucets and laptops as shown in Fig. 4 . For the drawer, we assume that only one part of the drawer requires to be operated if there is more than one movable part. Besides, we only consider the case that the handle of the faucet rotates in horizontal direction. The articulated object is randomly located on the table with its base link fixed, and s 0 subscript 𝑠 0 s_{0} italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is randomly set. We randomly select Δ s t a r g e t Δ subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 \Delta s_{target} roman_Δ italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT which does not exceed the joint limit and covers both directions of possible movement.
To remove the influence of the background, we crop the object point cloud out of the scene using a bounding box. It is worth noting that we locate the camera on the right side of the robot rather than the front. This setting is better aligned with real application scenarios while increasing point cloud occlusion and manipulation difficulty. We further build the robot in simulation using the CAD models. We use SAPIEN [ 51 ] as the physics simulator to collect training data for the Explicit Physics Model Construction module and create simulation environments for the Sampling-based Model Predictive Control module.
superscript 60 [-60^{\circ},+60^{\circ}) [ - 60 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , + 60 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) and [ 15 ∘ , 45 ∘ ) superscript 15 superscript 45 [15^{\circ},45^{\circ}) [ 15 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) . 10000 samples are collected for each category. We downsample the object point clouds to 8192 points. The 3 categories are trained jointly.
Eigengrasp Dataset Construction. To build the dataset for eigengrasp computation, we utilize DexGraspNet [ 57 ] to generate a collection of random grasping postures for the Allegro Hand. The dataset includes 60800 grasp postures across 474 objects. We then compute the eigengrasp based on this data. Fig. 5 shows the accumulated ratios of different eigengrasp dimensions. Unless otherwise specified, we use eigengrasp dimension m = 2 𝑚 2 m=2 italic_m = 2 for dexterous manipulation experiments.
Iv-c 1 real world articulated object manipulation.
For parameters of the Interactive Perception module, we choose n p = 10 subscript 𝑛 𝑝 10 n_{p}=10 italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 10 and n a = 10 subscript 𝑛 𝑎 10 n_{a}=10 italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 10 . For parameters of the Sampling-based Model Predictive Control module, we find that T = 50 𝑇 50 T=50 italic_T = 50 , N = 300 𝑁 300 N=300 italic_N = 300 , h = 10 ℎ 10 h=10 italic_h = 10 and K = 20 𝐾 20 K=20 italic_K = 20 are able to complete all the tasks. The range of incremental value of joint position is set to [ − 0.05 , 0.05 ] 0.05 0.05 [-0.05,0.05] [ - 0.05 , 0.05 ] . The parameters in the reward function are determined manually according to experience in the simulation environment. We set ω s = 20 subscript 𝜔 𝑠 20 \omega_{s}=20 italic_ω start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 20 , ϵ = 0.005 italic-ϵ 0.005 \epsilon=0.005 italic_ϵ = 0.005 (m or rad), ω t = 50 subscript 𝜔 𝑡 50 \omega_{t}=50 italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 50 , ω c o n t a c t = 10 subscript 𝜔 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 10 \omega_{contact}=10 italic_ω start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT = 10 , ω c o l l i s i o n = 60 subscript 𝜔 𝑐 𝑜 𝑙 𝑙 𝑖 𝑠 𝑖 𝑜 𝑛 60 \omega_{collision}=60 italic_ω start_POSTSUBSCRIPT italic_c italic_o italic_l italic_l italic_i italic_s italic_i italic_o italic_n end_POSTSUBSCRIPT = 60 , ω d = 10 subscript 𝜔 𝑑 10 \omega_{d}=10 italic_ω start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 10 , ω a = 0.01 subscript 𝜔 𝑎 0.01 \omega_{a}=0.01 italic_ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 0.01 and ω v = 0.03 subscript 𝜔 𝑣 0.03 \omega_{v}=0.03 italic_ω start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 0.03 . We use 20 processes for sampling in simulation on a computer that has an Intel Core i7-12700 CPU and an NVIDIA 3080Ti GPU. It takes 4 minutes to find a feasible trajectory.
We conduct about 30 experiments for each category. After the trajectory is executed in the real world, we measure the real joint movement Δ s r e a l = s r e a l − s i n i t i a l Δ subscript 𝑠 𝑟 𝑒 𝑎 𝑙 subscript 𝑠 𝑟 𝑒 𝑎 𝑙 subscript 𝑠 𝑖 𝑛 𝑖 𝑡 𝑖 𝑎 𝑙 \Delta s_{real}=s_{real}-s_{initial} roman_Δ italic_s start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t italic_i italic_a italic_l end_POSTSUBSCRIPT and compare it with the target joint movement Δ s t a r g e t = s t a r g e t − s i n i t i a l Δ subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 subscript 𝑠 𝑖 𝑛 𝑖 𝑡 𝑖 𝑎 𝑙 \Delta s_{target}=s_{target}-s_{initial} roman_Δ italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t italic_i italic_a italic_l end_POSTSUBSCRIPT . We compute the error δ = Δ s r e a l − Δ s t a r g e t 𝛿 Δ subscript 𝑠 𝑟 𝑒 𝑎 𝑙 Δ subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 \delta=\Delta s_{real}-\Delta s_{target} italic_δ = roman_Δ italic_s start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT - roman_Δ italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT and the relative error δ r = δ / Δ s t a r g e t × 100 % subscript 𝛿 𝑟 𝛿 Δ subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 percent 100 \delta_{r}={\delta}/{\Delta s_{target}}\times 100\% italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_δ / roman_Δ italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT × 100 % , results of all the experiments can be found in Fig. 6 , and statistical results can be found in Table I . Trajectories of both opening and closing the laptop are shown in Fig. 7 .
Among all 3 categories, the drawer has the lowest | δ r | subscript 𝛿 𝑟 \left|\delta_{r}\right| | italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | and the faucet has the highest | δ r | subscript 𝛿 𝑟 \left|\delta_{r}\right| | italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | according to Table I . It is reasonable because the size of the faucet is relatively small, a minor inaccuracy in model construction or trajectory execution will result in a big error in the joint state. About 70 % percent 70 70\% 70 % of manipulations achieve a | δ r | < 30 % subscript 𝛿 𝑟 percent 30 \left|\delta_{r}\right|<30\% | italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | < 30 % for drawers and laptops, which shows the accuracy of our method.
Category | Drawer | Laptop | Faucet | |
Number of manipulations | 31 | 32 | 30 | |
<10% | 12 | 7 | 0 | |
<30% | 22 | 20 | 9 | |
Number of manipulations s.t. | <50% | 28 | 26 | 19 |
Avg | 1.15cm | 5.69 | 10.37 | |
Avg | 21.81% | 27.26% | 56.21% |
Errors may be caused by the following factors:
The constructed mesh is not accurate enough, especially for the parts that are occluded. For example, the inside face of the drawer front cannot be observed by the RGBD camera, so when the digital twin is constructed, the drawer front is thicker than the real one. It causes the results of opening tasks of drawers (which has average | δ | 𝛿 \left|\delta\right| | italic_δ | of 2 c m 2 c m 2\mathrm{cm} 2 roman_c roman_m ) to be worse than closing tasks (which have average | δ | 𝛿 \left|\delta\right| | italic_δ | of 0.5 cm 0.5 cm 0.5\mathrm{cm} 0.5 roman_cm ). It is worth noting that there is a relative error of over 400 % percent 400 400\% 400 % in turning faucet tasks. This happens because the robot touches the part close to the joint axis first (which does not occur in the simulation), causing a huge rotation of the handle.
The dynamic properties of the real articulated objects are complicated. For example, the elastic deformation of laptops is not modeled in the simulation.
The kinematic structure of a real articulated object is not ideal. For example, there might be gaps in the drawer rails, which turns the original prismatic joint into a joint with several DOFs.
The reward function in the sampling-based model predictive control module is designed to guide the robot to complete the task. To examine the impact of each term of the reward function, we conduct the ablation study. There are 5 terms in the reward function, so 6 groups of experiments are conducted to reveal each term’s influence against the full reward function. The first group runs iCEM with the full reward function as in Section III-C . Each of the other 5 groups drops one term of the full reward function. In each group, 5 tasks are conducted to make the results more general. The task that is considered to be failed if not completed within 50 time steps. Fig. 8 summarizes the experimental results.
The experiments using the full reward function are superior in both success rate and steps to succeed, except for the experiments without r r e g subscript 𝑟 𝑟 𝑒 𝑔 r_{reg} italic_r start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT . However, the trajectories searched in w/o r r e g subscript 𝑟 𝑟 𝑒 𝑔 r_{reg} italic_r start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT are not suitable for real-world execution, because the robot tends to move to an unusual configuration which could be dangerous. Without r d i s t subscript 𝑟 𝑑 𝑖 𝑠 𝑡 r_{dist} italic_r start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT , the robot cannot complete the task because the horizon is too short to achieve a positive reward. Omitting r t a r g e t subscript 𝑟 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 r_{target} italic_r start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT , r s u c c e s s subscript 𝑟 𝑠 𝑢 𝑐 𝑐 𝑒 𝑠 𝑠 r_{success} italic_r start_POSTSUBSCRIPT italic_s italic_u italic_c italic_c italic_e italic_s italic_s end_POSTSUBSCRIPT , or r c o n t a c t subscript 𝑟 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 r_{contact} italic_r start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT results in lower success rates, and even when successful, the robot requires more steps to complete the task.
Iv-d 1 real world articulated object manipulation.
For each 3 categories, we choose one object for real object manipulation experiments. Considering the FOV of the RGBD camera as well as the workspace of motion planning, we randomly set the location and initial joint status s 0 subscript 𝑠 0 s_{0} italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT of articulated objects on the table in a certain range, such that the object is in the workspace of the manipulator. We randomly select Δ s t a r g e t Δ subscript 𝑠 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 \Delta s_{target} roman_Δ italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT which does not exceed the joint limit and covers both directions of possible movement.
For each category, we conduct 30 experiments. For parameters of Sampling-based Model Predictive Control module, we find that T = 50 𝑇 50 T=50 italic_T = 50 , N = 100 𝑁 100 N=100 italic_N = 100 , h = 10 ℎ 10 h=10 italic_h = 10 leads to fast searching as well as good performance. For the parameters in the reward function, we make adjustments based on the results of simulation experiments. We set ω s = 20 subscript 𝜔 𝑠 20 \omega_{s}=20 italic_ω start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 20 , ω t = 50 subscript 𝜔 𝑡 50 \omega_{t}=50 italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 50 , ω c o n t a c t = 10 subscript 𝜔 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 10 \omega_{contact}=10 italic_ω start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT = 10 , ω d = 10 subscript 𝜔 𝑑 10 \omega_{d}=10 italic_ω start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 10 , ω c = 0.001 subscript 𝜔 𝑐 0.001 \omega_{c}=0.001 italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 0.001 and ω v = 0.01 subscript 𝜔 𝑣 0.01 \omega_{v}=0.01 italic_ω start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 0.01 . We use eigen dimension m = 2 𝑚 2 m=2 italic_m = 2 to conduct real world manipulation. We use 10 processes for sampling in simulation on a computer that has an Intel Core i7-12700 CPU and an NVIDIA 3080Ti GPU. It takes about 2.5 minutes to find a feasible trajectory.
Results of all the experiments can be found in Fig. 9 , and statistical results can be found in Table II . Trajectories of opening and closing a drawer are shown in Fig. 10 . Similar to manipulation with 2-finger gripper, the drawer has the lowest | δ r | subscript 𝛿 𝑟 \left|\delta_{r}\right| | italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | and the faucet has the highest | δ r | subscript 𝛿 𝑟 \left|\delta_{r}\right| | italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | according to Table II .
Category | Drawer | Laptop | Faucet | |
Number of manipulations | 32 | 31 | 26 | |
<20% | 14 | 12 | 4 | |
<40% | 22 | 23 | 18 | |
Number of manipulations s.t. | <60% | 27 | 27 | 22 |
Avg | 1.90cm | 6.92 | 9.48 | |
Avg | 28.25% | 30.76% | 45.72% |
For the ablation study of dexterous hand manipulation, we investigate the impact of several key factors. Specifically, we analyze the influences of eigengrasp dimensions on Sampling-based Model Predictive Control, study how pixel projection affects the Interactive Perception module and also explore the influences of different reward functions.
Computation time For each task, we generate 30 different trajectories and compare the average time per step as shown in Fig 13 . It is shown that using eigengrasp dimension m = 2 𝑚 2 m=2 italic_m = 2 results in approximately 1 second less per step compared to 16 dimensions. Consequently, it takes nearly 1 minute less to find a feasible trajectory.
Pixel Projection. In Section III-A 2 , we propose a pixel projection method that leverages both RGB images and depth information of an object to transforms a 2D post-contact vector into a 3D robot trajectory. To evaluate the necessity of the pixel projection approach, we compare it with randomly generated vectors based solely on 2D affordance. Specifically, we select one object per category and generate three random vectors for each object. The results, shown in Fig. 14 , demonstrate that the vector synthesized by pixel transformation is better suited for executing one-step interactions compared to the randomly generated direction vector.
Reward Function. In Section III-C 2 , we design reward function in the sampling-based model predictive control module for dexterous hand manipulation tasks. Five terms of reward functions are designed, which include r s u c c e s s subscript 𝑟 𝑠 𝑢 𝑐 𝑐 𝑒 𝑠 𝑠 r_{success} italic_r start_POSTSUBSCRIPT italic_s italic_u italic_c italic_c italic_e italic_s italic_s end_POSTSUBSCRIPT , r t a r g e t subscript 𝑟 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 r_{target} italic_r start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT , r c o n t a c t subscript 𝑟 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 r_{contact} italic_r start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT , r d i s t subscript 𝑟 𝑑 𝑖 𝑠 𝑡 r_{dist} italic_r start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT and r r e g subscript 𝑟 𝑟 𝑒 𝑔 r_{reg} italic_r start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT . We conduct 5 groups of experiments to reveal each term’s influence against the full reward function. The first group runs iCEM with the full reward function as in III-C . The other 4 experiments drop one term of the full reward function. To make the ablation result more generalizable, we conduct 3 tasks for each group: opening laptop, closing laptop and turning faucet. In each task, we randomize the position of the object, the initial joint angle of the robot as well as the target qpos of the object to generate 30 different trajectory running iCEM. The task that is not done when time step reaches 50 is considered failed. Fig. 15 and Fig. 15 respectively summarize the experimental results.
The experiments using the full reward function consistently outperforms others in terms of both success rate and steps in completion, except for the experiments without r c o n t a c t subscript 𝑟 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 r_{contact} italic_r start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT . It’s worth noticing that the reward function without r c o n t a c t subscript 𝑟 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 r_{contact} italic_r start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT even exhibits a surprising advantage in terms of the number of steps to success in task 2. This unexpected result may be attributed to the absence of constraints imposed by the human-like hand posture encouraged by r c o n t a c t subscript 𝑟 𝑐 𝑜 𝑛 𝑡 𝑎 𝑐 𝑡 r_{contact} italic_r start_POSTSUBSCRIPT italic_c italic_o italic_n italic_t italic_a italic_c italic_t end_POSTSUBSCRIPT . Without this component, the iCEM algorithm might explore unconventional hand postures to interact with the object. On the other hand, omitting r d i s t subscript 𝑟 𝑑 𝑖 𝑠 𝑡 r_{dist} italic_r start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT from the reward function makes the tasks impossible to accomplish for the robot. The short planning horizon prevents the robot from accumulating positive rewards. Similarly, excluding r t a r g e t subscript 𝑟 𝑡 𝑎 𝑟 𝑔 𝑒 𝑡 r_{target} italic_r start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT and r s u c c e s s subscript 𝑟 𝑠 𝑢 𝑐 𝑐 𝑒 𝑠 𝑠 r_{success} italic_r start_POSTSUBSCRIPT italic_s italic_u italic_c italic_c italic_e italic_s italic_s end_POSTSUBSCRIPT leads to decreased success rates. In successful cases, the robot requires additional steps to accomplish the task.
In this section, we validate the advantages of the dexterous hand over the two-finger gripper through experiments on five tasks. For each task, we randomize the object’s position and the robot’s initial configuration 10 times. We then run the iCEM algorithm using both the Allegro hand and the Robotiq gripper. We use the number of steps to complete the task as the metrics.
Fig. 16 summarizes the comparison results between the dexterous hand and the two-finger gripper. Except for the laptop opening task, the dexterous hand consistently requires fewer steps on average. The anomaly in the laptop opening task can be attributed to its simplicity, as it does not require precise contact between the end effector and the object. Fig. 17 visualizes the trajectories for the laptop closing task, showing that our method is able to find a shorter trajectory for the dexterous hand by utilizing its additional degrees of freedom to close the laptop efficiently.
The Interactive Perception module is designed to improve the accuracy of the constructed world model by utilizing the two different point clouds captured before and after interaction. To evaluate its necessity, we train another model using a single-frame point cloud as the network input. For each category of objects, we select one real object and compare the modeling results of two-frame and single-frame point cloud inputs. Fig. 18 shows the comparison results. The findings demonstrate that actively interacting with the movable part of the object and altering its state allows us to build a transition model with more accurate segmentation of movable parts and joint axis estimation, which is necessary for precise manipulation.
By utilizing a physics simulation as the explicit world model, our method ensures generalizability to unseen actions. This allows for easy extension to advanced manipulation skills, such as manipulation with tools. As shown in Fig. 19 , when the drawer is located out of the dexterous range of the robot or the gap between the drawer front and body is too small, the gripper alone cannot open it. In such cases, the robot can employ nearby tools to complete the task.
To demonstrate our method’s tool-using capability, we use two different tools for the drawer-opening task. Benefiting from the explicit physics model, we can equip the robot with a tool to interact with the articulated object in the simulation. When using MPC to search for trajectories, we assume the tool is mounted on the robot’s end effector. We simply replace the gripper tips with the tool in r d i s t subscript 𝑟 𝑑 𝑖 𝑠 𝑡 r_{dist} italic_r start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT when computing rewards. Remarkably, our method successfully finds a feasible trajectory with most parameters unchanged.
In this work, we present DexSim2Real 2 2 {}^{\textbf{2}} start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT , a novel robot learning framework designed for precise, goal-conditioned articulated object manipulation with two-finger grippers and dexterous hands. We first build the explicit world model of the target object in a physics simulator through active interaction and then use MPC to search for a long-horizon manipulation trajectory to achieve the desired manipulation goal. Quantitative evaluation of real object manipulation results verifies the effectiveness of our proposed framework for both kinds of end effectors.
For future work, we plan to integrate proprioceptive sensing and tactile sensing during real-robot interaction to refine the constructed world model for more precise manipulation. 3D generative AI has seen great progress in the last few years. We also plan to integrate the AIGC technique to improve the geometry quality of the digital twin. Besides, a module that estimates the state of the object in real time will enhance reactive manipulations. Lastly, we aim to expand the framework to include mobile manipulation, objects with multiple movable parts and deformable objects, thereby broadening its applicability across various robotic tasks.
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Learning objectives.
The prefix quasi means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979) [1] . Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem. But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.
Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones here.
Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A nonequivalent groups design , then, is a between-subjects design in which participants have not been randomly assigned to conditions.
Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This design would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments, might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.
Of course, researchers using a nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.
In a pretest-posttest design , the dependent variable is measured once before the treatment is implemented and once after it is implemented. Imagine, for example, a researcher who is interested in the effectiveness of an antidrug education program on elementary school students’ attitudes toward illegal drugs. The researcher could measure the attitudes of students at a particular elementary school during one week, implement the antidrug program during the next week, and finally, measure their attitudes again the following week. The pretest-posttest design is much like a within-subjects experiment in which each participant is tested first under the control condition and then under the treatment condition. It is unlike a within-subjects experiment, however, in that the order of conditions is not counterbalanced because it typically is not possible for a participant to be tested in the treatment condition first and then in an “untreated” control condition.
If the average posttest score is better than the average pretest score, then it makes sense to conclude that the treatment might be responsible for the improvement. Unfortunately, one often cannot conclude this with a high degree of certainty because there may be other explanations for why the posttest scores are better. One category of alternative explanations goes under the name of history . Other things might have happened between the pretest and the posttest. Perhaps an antidrug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it. Another category of alternative explanations goes under the name of maturation . Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a yearlong program, participants might become less impulsive or better reasoners and this might be responsible for the change.
Another alternative explanation for a change in the dependent variable in a pretest-posttest design is regression to the mean . This refers to the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study because of their extreme scores. Imagine, for example, that only students who scored especially low on a test of fractions are given a special training program and then retested. Regression to the mean all but guarantees that their scores will be higher even if the training program has no effect. A closely related concept—and an extremely important one in psychological research—is spontaneous remission . This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001) [2] . Thus one must generally be very cautious about inferring causality from pretest-posttest designs.
Early studies on the effectiveness of psychotherapy tended to use pretest-posttest designs. In a classic 1952 article, researcher Hans Eysenck summarized the results of 24 such studies showing that about two thirds of patients improved between the pretest and the posttest (Eysenck, 1952) [3] . But Eysenck also compared these results with archival data from state hospital and insurance company records showing that similar patients recovered at about the same rate without receiving psychotherapy. This parallel suggested to Eysenck that the improvement that patients showed in the pretest-posttest studies might be no more than spontaneous remission. Note that Eysenck did not conclude that psychotherapy was ineffective. He merely concluded that there was no evidence that it was, and he wrote of “the necessity of properly planned and executed experimental studies into this important field” (p. 323). You can read the entire article here:
The Effects of Psychotherapy: An Evaluation
Fortunately, many other researchers took up Eysenck’s challenge, and by 1980 hundreds of experiments had been conducted in which participants were randomly assigned to treatment and control conditions, and the results were summarized in a classic book by Mary Lee Smith, Gene Glass, and Thomas Miller (Smith, Glass, & Miller, 1980) [4] . They found that overall psychotherapy was quite effective, with about 80% of treatment participants improving more than the average control participant. Subsequent research has focused more on the conditions under which different types of psychotherapy are more or less effective.
A variant of the pretest-posttest design is the interrupted time-series design . A time series is a set of measurements taken at intervals over a period of time. For example, a manufacturing company might measure its workers’ productivity each week for a year. In an interrupted time series-design, a time series like this one is “interrupted” by a treatment. In one classic example, the treatment was the reduction of the work shifts in a factory from 10 hours to 8 hours (Cook & Campbell, 1979) [5] . Because productivity increased rather quickly after the shortening of the work shifts, and because it remained elevated for many months afterward, the researcher concluded that the shortening of the shifts caused the increase in productivity. Notice that the interrupted time-series design is like a pretest-posttest design in that it includes measurements of the dependent variable both before and after the treatment. It is unlike the pretest-posttest design, however, in that it includes multiple pretest and posttest measurements.
Figure 7.3 shows data from a hypothetical interrupted time-series study. The dependent variable is the number of student absences per week in a research methods course. The treatment is that the instructor begins publicly taking attendance each day so that students know that the instructor is aware of who is present and who is absent. The top panel of Figure 7.3 shows how the data might look if this treatment worked. There is a consistently high number of absences before the treatment, and there is an immediate and sustained drop in absences after the treatment. The bottom panel of Figure 7.3 shows how the data might look if this treatment did not work. On average, the number of absences after the treatment is about the same as the number before. This figure also illustrates an advantage of the interrupted time-series design over a simpler pretest-posttest design. If there had been only one measurement of absences before the treatment at Week 7 and one afterward at Week 8, then it would have looked as though the treatment were responsible for the reduction. The multiple measurements both before and after the treatment suggest that the reduction between Weeks 7 and 8 is nothing more than normal week-to-week variation.
A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a control group that is given a pretest, does not receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve but whether they improve more than participants who do not receive the treatment.
Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an antidrug program, and finally are given a posttest. Students in a similar school are given the pretest, not exposed to an antidrug program, and finally are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this change in attitude could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.
Finally, if participants in this kind of design are randomly assigned to conditions, it becomes a true experiment rather than a quasi experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.
Research Methods in Psychology Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
IMAGES
VIDEO
COMMENTS
Revised on January 22, 2024. Like a true experiment, a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable. However, unlike a true experiment, a quasi-experiment does not rely on random assignment. Instead, subjects are assigned to groups based on non-random criteria.
A quasi-experimental study (also known as a non-randomized pre-post intervention) is a research design in which the independent variable is manipulated, but participants are not randomly assigned to conditions.. Commonly used in medical informatics (a field that uses digital information to ensure better patient care), researchers generally use this design to evaluate the effectiveness of a ...
Key Takeaways. Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or orders of conditions. Among the important types are nonequivalent groups designs, pretest-posttest, and interrupted time-series designs.
The prefix quasi means "resembling." Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook et al., 1979).Because the independent variable is manipulated before the dependent variable is ...
Quasi-Experimental Design. Quasi-Experimental Design is a unique research methodology because it is characterized by what is lacks. For example, Abraham & MacDonald (2011) state: " Quasi-experimental research is similar to experimental research in that there is manipulation of an independent variable. It differs from experimental research ...
The strongest quasi-experimental designs for causal inference are regression discontinuity designs, instrumental variable designs, matching and propensity score designs, and comparative interrupted time series designs. ... Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142 ...
Both simple quasi-experimental designs and embellishments of these simple designs are presented. Potential threats to internal validity are illustrated along with means of addressing their potentially biasing effects so that these effects can be minimized. ... Manipulation of the running variable in the regression discontinuity design: A ...
The primary goal of the bulk of scientific research is to ask how elements of a system causally affect other elements. Causality is at the heart of many questions in behaviour and neuroscience ...
Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable (s) that is available in a true experimental design. In a quasi-experimental design, the researcher uses an existing group of participants that is not randomly assigned to ...
An experimental design is one in which participants are randomly assigned to levels of the independent variable. As we saw in our discussion of random assignment, experimental designs are preferred when the goal is to make cause-and-effect conclusions because they reduce the risk that the results could be due to a confounding variable.
Quasi-experimental designs allow implementation scientists to conduct rigorous studies in these contexts, albeit with certain limitations. We briefly review the characteristics of these designs here; other recent review articles are available for the interested reader (e.g. Handley et al., 2018). 2.1.
The prefix quasi means "resembling." Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). [1] Because the independent variable is manipulated before the dependent variable ...
The first part of creating a quasi-experimental design is to identify the variables. The quasi-independent variable is the variable that is manipulated in order to affect a dependent variable. It is generally a grouping variable with different levels. Grouping means two or more groups, such as two groups receiving alternative treatments, or a treatment group and a no-treatment group (which may ...
QEDs test causal hypotheses but, in lieu of fully randomized assignment of the intervention, seek to define a comparison group or time period that reflects the counter-factual (i.e., outcomes if the intervention had not been implemented) ().QEDs seek to identify a comparison group or time period that is as similar as possible to the treatment group or time period in terms of baseline (pre ...
Written by MasterClass. Last updated: Jun 16, 2022 • 3 min read. A quasi-experimental design can be a great option when ethical or practical concerns make true experiments impossible, but the research methodology does have its drawbacks. Learn all the ins and outs of a quasi-experimental design. Explore.
Both true and quasi-experimental research are distinguished by one common characteristic: manipulation. No other type of research has manipulation of the independent variable. Two other forms of quantitative research, which are not experimental due to lack of manipulation, are ex post facto (sometimes called causal-comparative) and correlational.
10 Experimental research. 10. Experimental research. Experimental research—often considered to be the 'gold standard' in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different ...
Definitions. The experimental study is a powerful methodology for testing causal relations between one or more explanatory variables (i.e., independent variables) and one or more outcome variables (i.e., dependent variable). In order to accomplish this goal, experiments have to meet three basic criteria: (a) experimental manipulation (variation ...
Quasi-experimental designs are generally less expensive than true experimental designs and are sometimes the best or only realistic option for ethical or other reasons. The most common quasi-experimental designs are listed and outlined in Table 3. The group sequential design is sometimes also called a "single group time series." A single ...
Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.
There are three types of experiments you need to know: 1. Lab Experiment. A laboratory experiment in psychology is a research method in which the experimenter manipulates one or more independent variables and measures the effects on the dependent variable under controlled conditions. A laboratory experiment is conducted under highly controlled ...
Abstract. Articulated object manipulation is ubiquitous in daily life. In this paper, we present DexSim2Real 2 2 {}^{\textbf{2}} start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, a novel robot learning framework for goal-conditioned articulated object manipulation using both two-finger grippers and multi-finger dexterous hands.The key of our framework is constructing an explicit world model of ...
The prefix quasi means "resembling." Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979) [1]. Because the independent variable is manipulated before the dependent variable ...