Food insecurity adversely affects human health, which means food security and nutrition are crucial to improving people's health outcomes. Both food insecurity and health outcomes are the policy and agenda of the 2030 Sustainable Development Goals (SDGs). However, there is a lack of macro-level empirical studies concerning the relationship between food insecurity and health outcomes in sub-Saharan African (SSA) countries though the region is highly affected by food insecurity and its related health problems. Therefore, this study aims to examine the impact of food insecurity on life expectancy and infant mortality in SSA countries.

The study was conducted for the whole population of 31 sampled SSA countries selected based on data availability. The study uses secondary data collected online from the databases of the United Nations Development Programme (UNDP), the Food and Agricultural Organization (FAO), and the World Bank (WB). The study uses yearly balanced data from 2001 to 2018. This study employs a multicountry panel data analysis and several estimation techniques; it employs Driscoll-Kraay standard errors (DKSE), a generalized method of momentum (GMM), fixed effects (FE), and the Granger causality test.

A 1% increment in people’s prevalence for undernourishment reduces their life expectancy by 0.00348 percentage points (PPs). However, life expectancy rises by 0.00317 PPs with every 1% increase in average dietary energy supply. A 1% rise in the prevalence of undernourishment increases infant mortality by 0.0119 PPs. However, a 1% increment in average dietary energy supply reduces infant mortality by 0.0139 PPs.


Food insecurity harms the health status of SSA countries, but food security impacts in the reverse direction. This implies that to meet SDG 3.2, SSA should ensure food security.

Food security is essential to people’s health and well-being [ 1 ]. Further, the World Health Organization (WHO) argues that health is wealth and poor health is an integral part of poverty; governments should actively seek to preserve their people’s lives and reduce the incidence of unnecessary mortality and avoidable illnesses [ 2 ]. However, lack of food is one of the factors which affect health outcomes. Concerning this, the Food Research and Action Center noted that the social determinants of health, such as poverty and food insecurity, are associated with some of the most severe and costly health problems in a nation [ 3 ].

According to the FAO, the International Fund for Agricultural Development (IFAD), and the World Food Programme (WFP), food insecurity is defined as "A situation that exists when people lack secure access to sufficient amounts of safe and nutritious food for normal growth and development and an active and healthy life" ([ 4 ]; p50). It is generally believed that food security and nutrition are crucial to improving human health and development. Studies show that millions of people live in food insecurity, which is one of the main risks to human health. Around one in four people globally (1.9 billion people) were moderately or severely food insecure in 2017 and the greatest numbers were in SSA and South Asia. Around 9.2% of the world's population was severely food insecure in 2018. Food insecurity is highest in SSA countries, where nearly one-third are defined as severely insecure [ 5 ]. Similarly, 11% (820 million) of the world's population was undernourished in 2018, and SSA countries still share a substantial amount [ 5 ]. Even though globally the number of people affected by hunger has been decreasing since 1990, in recent years (especially since 2015) the number of people living in food insecurity has increased. It will be a huge challenge to achieve the SDGs of zero hunger by 2030 [ 6 ]. FAO et al. [ 7 ] projected that one in four individuals in SSA were undernourished in 2017. Moreover, FAO et al. [ 8 ] found that, between 2014 and 2018, the prevalence of undernourishment worsened. Twenty percent of the continent's population, or 256 million people, are undernourished today, of which 239 million are in SSA. Hidden hunger is also one of the most severe types of malnutrition (micronutrient deficiencies). One in three persons suffers from inadequacies related to hidden hunger, which impacts two billion people worldwide [ 9 ]. Similarly, SSA has a high prevalence of hidden hunger [ 10 , 11 ].

An important consequence of food insecurity is that around 9 million people die yearly worldwide due to hunger and hunger-related diseases. This is more than from Acquired Immunodeficiency Syndrome (AIDS), malaria, and tuberculosis combined [ 6 ]. Even though the hunger crisis affects many people of all genders and ages, children are particularly affected in Africa. There are too many malnourished children in Africa, and malnutrition is a major factor in the high infant mortality rates and causes physical and mental development delays and disorders in SSA [ 12 ]. According to UN statistics, chronic malnutrition globally accounts for 165 million stunted or underweight children. Around 75% of these kids are from SSA and South Asia. Forty percent of children in SSA are impacted. In SSA, about 3.2 million children under the age of five dies yearly, which is about half of all deaths in this age group worldwide. Malnutrition is responsible for almost one child under the age of five dying every two minutes worldwide. The child mortality rate in the SSA is among the highest in the world, about one in nine children pass away before the age of five [ 12 ].

In addition to the direct impact of food insecurity on health outcomes, it also indirectly contributes to disordered eating patterns, higher or lower blood cholesterol levels, lower serum albumin, lower hemoglobin, vitamin A levels, and poor physical and mental health [ 13 , 14 , 15 ]. Iodine, iron, and zinc deficiency are the most often identified micronutrient deficiencies across all age groups. A deficiency in vitamin A affects an estimated 190 million pre-schoolers and 19 million pregnant women [ 16 ]. Even though it is frequently noted that hidden hunger mostly affects pregnant women, children, and teenagers, it further affects people’s health at all stages of life [ 17 ].

With the above information, researchers and policymakers should focus on the issue of food insecurity and health status. The SDGs that were developed in 2015 intend to end hunger in 2030 as one of its primary targets. However, a growing number of people live with hunger and food insecurity, leading to millions of deaths. Hence, this study questioned what is the impact of food insecurity on people's health outcomes in SSA countries. In addition, despite the evidence implicating food insecurity and poor health status, there is a lack of macro-level empirical studies concerning the impact of food insecurity on people’s health status in SSA countries, which leads to a knowledge (literature) gap. Therefore, this study aims to examine the impact of food insecurity on life expectancy and infant mortality in SSA countries for the period ranging from 2001–2018 using panel mean regression approaches.

Theoretical and conceptual framework

Structural factors, such as climate, socio-economic, social, and local food availability, affect people’s food security. People’s health condition is impacted by food insecurity through nutritional, mental health, and behavioral channels [ 18 ]. Under the nutritional channel, food insecurity has an impact on total caloric intake, diet quality, and nutritional status [ 19 , 20 , 21 ]. Hunger and undernutrition may develop when food supplies are scarce, and these conditions may potentially lead to wasting, stunting, and immunological deficiencies [ 22 ]. However, food insecurity also negatively influences health due to its effects on obesity, women's disordered eating patterns [ 23 ], and poor diet quality [ 24 ].

Under the mental health channel, Whitaker et al. [ 25 ] noted that food insecurity is related to poor mental health conditions (stress, sadness, and anxiety), which have also been linked to obesity and cardiovascular risk [ 26 ]. The effects of food insecurity on mental health can worsen the health of people who are already sick as well as lead to disease acquisition [ 18 ]. Similarly, the behavioral channel argues that there is a connection between food insecurity and health practices that impact disease management, prevention, and treatment. For example, lack of access to household food might force people to make bad decisions that may raise their risk of sickness, such as relying too heavily on cheap, calorically dense, nutrient-poor meals or participating in risky sexual conduct. In addition, food insecurity and other competing demands for survival are linked to poorer access and adherence to general medical treatment in low-income individuals once they become sick [ 27 , 28 , 29 , 30 ]

Food insecurity increases the likelihood of exposure to HIV and worsens the health of HIV-positive individuals [ 18 ]. Weiser et al. [ 31 ] found that food insecurity increases the likelihood of unsafe sexual activities, aggravating the spread of HIV. It can also raise the possibility of transmission through unsafe newborn feeding practices and worsening maternal health [ 32 ]. In addition, food insecurity has been linked to decreased antiretroviral adherence, declines in physical health status, worse immunologic status [ 33 ], decreased viral suppression [ 34 , 35 ], increased incidence of serious illness [ 36 ], and increased mortality [ 37 ] among people living with HIV.

With the above theoretical relationship between target variables and since this study focuses on the impact of food insecurity on health outcomes, and not on the causes, it adopted the conceptual framework of Weiser et al. [ 18 ] and constructed Fig.  1 .

figure 1

A conceptual framework of food insecurity and health. Source: Modified and constructed by the author using Weiser et al. [ 18 ] conceptual framework. Permission was granted by Taylor & Francis to use their original Figs. (2.2, 2.3, and 2.4); to develop the above figure. Permission number: 1072954

Several findings associate food insecurity with poorer health, worse disease management, and a higher risk of premature mortality even though they used microdata. For instance, Stuff et al. [ 38 ] found that food insecurity is related to poor self-reported health status, obesity [ 39 ], abnormal blood lipids [ 40 ], a rise in diabetes [ 24 , 40 ], increased gestational diabetes[ 41 ], increased perceived stress, depression and anxiety among women [ 25 , 42 ], Human Immunodeficiency Virus (HIV) acquisition risk [ 43 , 44 , 45 ], childhood stunting [ 46 ], poor health [ 47 ], mental health and behavioral problem [ 25 , 48 , 49 ].

The above highlight micro-level empirical studies, and since the scope of this study is macro-level, Table 1 provides only the existing macro-level empirical findings related to the current study.

Empirical findings in Table 1 are a few, implying a limited number of macro-level level empirical findings. Even the existing macro-level studies have several limitations. For instance, most studies either employed conventional estimation techniques or overlooked basic econometric tests; thus, their results and policy implications may mislead policy implementers. Except for Hameed et al. [ 53 ], most studies’ data are either outdated or unbalanced; hence, their results and policy implications may not be valuable in the dynamic world and may not be accurate like balanced data. Besides, some studies used limited (one) sampled countries; however, few sampled countries and observations do not get the asymptotic properties of an estimator [ 56 ]. Therefore, this study tries to fill the existing gaps by employing robust estimation techniques with initial diagnostic and post-estimation tests, basic panel econometric tests and robustness checks, updated data, a large number of samples.

Study setting and participants

According to Smith and Meade [ 57 ], the highest rates of both food insecurity and severe food insecurity were found in Sub-Saharan Africa in 2017 (55 and 28%, respectively), followed by Latin America and the Caribbean (32 and 12%, respectively) and South Asia (30 and 13%). Similarly, SSA countries have worst health outcomes compared to other regions. For instance, in 2020, the region had the lowest life expectancy [ 58 ] and highest infant mortality [ 59 ]. Having the above information, this study's target population are SSA countries chosen purposively. However, even though SSA comprises 49 of Africa's 55 countries that are entirely or partially south of the Sahara Desert. This study is conducted for a sample of 31 SSA countries (Angola, Benin, Botswana, Burkina Faso, Cameroon, Cabo Verde, Chad, Congo Rep., Côte d'Ivoire, Ethiopia, Gabon, The Gambia, Ghana, Kenya, Lesotho, Liberia, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mozambique, Namibia, Nigeria, Rwanda, Senegal, Sierra Leone, South Africa, Sudan, Tanzania, and Togo). The sampled countries are selected based on data accessibility for each variable included in the empirical models from 2001 to 2018. Since SSA countries suffer from food insecurity and related health problems, this study believes the sampled countries are appropriate and represent the region. Moreover, since this study included a large sample size, it improves the estimator’s precision.

Data type, sources, and scope

This study uses secondary data collected in December 2020 online from the databases of the Food and Agricultural Organization (FAO), the United Nations Development Programme (UNDP), and the World Bank (WB) (see Table 2 ). In addition, the study uses yearly balanced data from 2001 to 2018, which is appropriate because it captures the Millennium Development Goals, SDGs, and other economic conditions, such as the rise of SSA countries’ economies and the global financial crisis of the 2000s. Therefore, this study considers various global development programs and events. Generally, the scope of this study (sampled countries and time) is sufficient to represent SSA countries. In other words, the study has n*T = 558 observations, which fulfills the large sample size criteria recommended by Kennedy [ 56 ].

The empirical model

Model specification is vital to conduct basic panel data econometric tests and estimate the relationship of target variables. Besides social factors, the study includes economic factors determining people's health status. Moreover, it uses two proxies indicators to measure both food insecurity and health status; hence, it specifies the general model as follows:

The study uses four models to analyze the impact of food insecurity on health outcomes.

where LNLEXP and LNINFMOR (dependent variables) refer to the natural logarithm of life expectancy at birth and infant mortality used as proxy variables for health outcomes. Similarly, PRUND and AVRDES are the prevalence of undernourishment and average dietary energy supply adequacy – proxy and predictor variables for food insecurity.

Moreover, to regulate countries’ socio-economic conditions and to account for time-varying bias that can contribute to changes in the dependent variable, the study included control variables, such as GDPPC, GOVEXP, MNSCHOOL, and URBAN. GDPPC is GDP per capita, GOVEXP refers to domestic general government health expenditure, MNSCHOOL is mean years of schooling and URBAN refers to urbanization. Further, n it , v it , ε it , and μ it are the stochastic error terms at period t. The parameters \({\alpha }_{0}, { \beta }_{0}, { \theta }_{0},{ \delta }_{0}\) refer to intercept terms and \({\alpha }_{1}-{\alpha }_{5}, {\beta }_{1}-{\beta }_{5}, { \theta }_{1}-{\theta }_{5}, and {\delta }_{1}-{\delta }_{5}\) are the long-run estimation coefficients. Since health outcomes and food insecurity have two indicators used as proxy variables, this study estimates different alternative models and robustness checks of the main results. Furthermore, the above models did not address heterogeneity problems; hence, this study considers unobserved heterogeneity by introducing cross-section and time heterogeneity in the models. This is accomplished by assuming a two-way error component for the disturbances with:

From Eq.  2 , the unobservable individual (cross-section) and unobservable time heterogeneities are described by \({\delta }_{i} and {\tau }_{t}\) (within components), respectively. Nonetheless, the remaining random error term is \({\gamma }_{it}\) (panel or between components). Therefore, the error terms in model 1A-1D will be substituted by the right-hand side elements of Eq.  2 .

Depending on the presumptions of whether the error elements are fixed or random, the FE and RE models are the two kinds of models that will be evaluated. Equation ( 2 ) yields a two-way FE error component model, or just a FE model if the assumptions are that \({\delta }_{i} and {\tau }_{t}\) are fixed parameters to be estimated and that the random error component, \({\gamma }_{it}\) , is uniformly and independently distributed with zero mean and constant variance (homoscedasticity).

Equation ( 2 ), on the other hand, provides a two-way RE error component model or a RE model if we suppose \({\delta }_{i} and {\tau }_{t}\) are random, just like the random error term, or \({\delta }_{i},{\tau }_{t}, and {\gamma }_{it}\) are all uniformly and independently distributed with zero mean and constant variance, or they are all independent of each other and independent variables [ 60 ].

Rather than considering both error components, \({\delta }_{i}, and {\tau }_{t}\) , we can examine only one of them at a time (fixed or random), yielding a one-way error component model, FE or RE. The stochastic error term \({\varpi }_{it}\) in Eq.  2 will then be:

Statistical analysis

This study conducted descriptive statistics, correlation analysis, and initial diagnosis tests (cross-sectional and time-specific fixed effect, outliers and influential observations, multicollinearity, normality, heteroscedasticity, and serial correlation test). Moreover, it provides basic panel econometric tests and panel data estimation techniques. For consistency, statistical software (STATA) version 15 was used for all analyses.

Descriptive statistics and correlation analysis

Descriptive statistics is essential to know the behavior of the variables in the model. Therefore, it captures information, such as the mean, standard deviation, minimum, maximum, skewness, and kurtosis. Similarly, the study conducted Pearson correlation analysis to assess the degree of relationship between the variables.

Initial diagnosis

Cross-sectional and time-specific fixed effect.

One can anticipate differences arising over time or within the cross-sectional units, given that the panel data set comprises repeated observations over the same units gathered over many periods. Therefore, before estimation, this study considered unexplained heterogeneity in the models. One fundamental limitation of cross-section, panel, and time series data regression is that they do not account for country and time heterogeneity [ 60 ]. These unobserved differences across nations and over time are crucial in how the error term is represented and the model is evaluated. These unobserved heterogeneities, however, may be represented by including both country and time dummies in the regression. However, if the parameters exceed the number of observations, the estimate will fail [ 60 ]. However, in this study, the models can be estimated. If we include both country and time dummies, we may assume that the slope coefficients are constant, but the intercept varies across countries and time, yielding the two-way error components model. As a result, this study examines the null hypothesis that intercepts differ across nations and time in general.

Detecting outliers and influential observations

In regression analysis, outliers and influential observations may provide biased findings. Therefore, the Cooks D outlier and influential observation test was used in the study to handle outliers and influencing observations. To evaluate whether these outliers have a stronger impact on the model to be estimated, each observation in this test was reviewed and compared with Cook’s D statistic [ 61 ]. Cook distance evaluates the extent to which observation impacts the entire model or the projected values. Hence, this study tested the existence of outliers.

Normality, heteroscedasticity, multicollinearity, and serial correlation test

Before the final regression result, the data used for the variables were tested for normality, heteroscedasticity, multicollinearity, and serial correlation to examine the characteristics of the sample.

Regression models should be checked for nonnormal error terms because a lack of Gaussianity (normal distribution) can occasionally compromise the accuracy of estimation and testing techniques. Additionally, the validity of inference techniques, specification tests, and forecasting critically depends on the normalcy assumption [ 62 ]. Similarly, multicollinearity in error terms leads to a dataset being highly sensitive to a minor change, instability in the regression model, and skewed and unreliable results. Therefore, this study conducted the normality using Alejo et al. [ 62 ] proposed command and multicollinearity (using VIF) tests.

Most conventional panel data estimation methods rely on homoscedastic individual error variance and constant serial correlation. Since the error component is typically connected to the variance that is not constant during the observation and is serially linked across periods, these theoretical presumptions have lately reduced the applicability of various panel data models. Serial correlation and heteroskedasticity are two estimate issues frequently connected to cross-sectional and time series data, respectively. Similarly, panel data is not free from these issues because it includes cross-sections and time series, making the estimated parameters ineffective, and rendering conclusions drawn from the estimation incorrect [ 63 ]. Therefore, this study used the Wooldridge [ 63 ] test for serial correlation in linear panel models as well as the modified Wald test for heteroskedasticity.

Basic panel econometric tests

The basic panel data econometric tests are prerequisites for estimating the panel data. The three main basic panel data tests are cross-sectional dependence, unit root, and cointegration.

Cross-sectional dependence (CD)

A growing body of the panel data literature concludes that panel data models are likely to exhibit substantial CD in the errors resulting from frequent shocks, unobserved components, spatial dependence, and idiosyncratic pairwise dependence. Even though the impact of CD in estimation depends on several factors, relative to the static model, the effect of CD in dynamic panel estimators is more severe [ 64 ]. Moreover, Pesaran [ 65 ] notes that recessions and economic or financial crises potentially affect all countries, even though they might start from just one or two countries. These occurrences inevitably introduce cross-sectional interdependencies across the cross-sectional unit, their regressors, and the error terms. Hence, overlooking the CD in panel data leads to biased estimates and spurious results [ 64 , 66 ]. Further, the CD test determines the type of panel unit root and cointegration tests we should apply. Therefore, examining the CD is vital in panel data econometrics.

In the literature, there are several tests for CD, such as the Breusch and Pagan [ 67 ] Lagrange multiplier (LM) test, Pesaran [ 68 ] scaled LM test, Pesaran [ 68 ] CD test, and Baltagi et al. [ 69 ] bias-corrected scaled LM test (for more detail, see Tugcu and Tiwari [ 70 ]). Besides, Friedman [ 71 ] and Frees [ 72 , 73 ] also have other types of CD tests (for more detail, see De Hoyos and Sarafidis [ 64 ]). This study employs Frees [ 72 ] and Pesaran [ 68 ] among the existing CD tests. This is because, unlike the Breusch and Pagan [ 67 ] test, these tests do not require infinite T and fixed N, and are rather applicable for both a large N and T. Additionally, Free’s CD test can overcome the irregular signs associated with correlation. However, it also employs Friedman [ 71 ] CD for mixed results of the above tests.

Unit root test

The panel unit root and cointegration tests are common steps following the CD test. Generally, there are two types of panel unit root tests: (1) the first-generation panel unit root tests, such as Im et al. [ 74 ], Maddala and Wu [ 75 ], Choi [ 76 ], Levin et al. [ 77 ], Breitung [ 78 ] and Hadri [ 79 ], and (2) the second-generation panel unit root tests, such as [ 66 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 ].

The first-generation panel unit root tests have been criticized because they assume cross-sectional independence [ 90 , 91 , 92 , 93 ]. This hypothesis is somewhat restrictive and unrealistic, as macroeconomic time series exhibit significant cross-sectional correlation among countries in a panel [ 92 ], and co-movements of economies are often observed in the majority of macroeconomic applications of unit root tests [ 91 ]. The cross-sectional correlation of errors in panel data applications in economics is likely to be the rule rather than the exception [ 93 ]. Moreover, applying first-generation unit root tests under CD models can generate substantial size distortions [ 90 ], resulting in the null hypothesis of nonstationary being quickly rejected [ 66 , 94 ]. As a result, second-generation panel unit root tests have been proposed to take CD into account. Therefore, among the existing second-generation tests, this study employs Pesaran’s [ 66 ] cross-sectionally augmented panel unit root test (CIPS) for models 1A–1C . The rationale for this is that, unlike other unit root tests that allow CD, such as Bai and Ng [ 80 ], Moon and Perron [ 87 ], and Phillips and Sul [ 84 ], Pesaran’s [ 66 ] test is simple and clear. Besides, Pesaran [ 66 ] is robust when time-series’ heteroscedasticity is observed in the unobserved common factor [ 95 ]. Even though theoretically, Moon and Perron [ 87 ], Choi [ 96 ] and Pesaran [ 66 ] require large N and T, Pesaran [ 66 ] is uniquely robust in small sample sizes [ 97 ]. Therefore, this study employs the CIPS test to take into account CD, and heteroskedasticity in the unobserved common factor and both large and small sample countries. However, since there is no CD in model 1D , this study employs the first-generation unit root tests called Levin, Lin, and Chu (LLC), Im, Pesaran, Shin (IPS) and Fisher augmented Dickey–Fuller (ADF) for model 1D .

Cointegration test

The most common panel cointegration tests when there is CD are Westerlund [ 98 ], Westerlund and Edgerton [ 99 ], Westerlund and Edgerton [ 100 ], Groen and Kleibergen [ 101 ], Westerlund’s [ 102 ] Durbin-Hausman test, Gengenbach et al. [ 103 ] and Banerjee and Carrion-i-Silvestre [ 104 ]. However, except for a few, most tests are not coded in Statistical Software (STATA) and are affected by insufficient observations. The current study primarily uses Westerlund [ 98 ] and Banerjee and Carrion-i-Silvestre [ 104 ] for models 1A–1C . However, to decide uncertain results, it also uses McCoskey and Kao [ 105 ] cointegration tests for model 1C . The rationale for using Westerlund’s [ 98 ] cointegration test is that most panel cointegration has failed to reject the null hypothesis of no cointegration due to the failure of common-factor restriction [ 106 ]. However, Westerlund [ 98 ] does not require any common factor restriction [ 107 ] and allows for a large degree of heterogeneity (e.g., individual-specific short-run dynamics, intercepts, linear trends, and slope parameters) [ 92 , 107 , 108 ]. Besides, its command is coded and readily available in STATA. However, it suffers from insufficient observations, especially when the number of independent variables increases. The present study employs the Banerjee and Carrion-i-Silvestre [ 104 ] and McCoskey and Kao [ 105 ] cointegration tests to overcome this limitation. The two Engle-Granger-based cointegration tests applicable when there is no CD and are widely used and available in STATA are Pedroni [ 109 , 110 ] and Kao [ 111 ]. However, the Pedroni test has two benefits over Kao: it assumes cross-sectional dependency and considers heterogeneity by employing specific parameters [ 112 ]. Hence, this study uses the Pedroni cointegration test for model 1D .

Panel data estimation techniques

The panel data analysis can be conducted using different estimation techniques and is mainly determined by the results of basic panel econometric tests. Thus, this study mainly employs the Driscoll-Kraay [ 113 ] standard error (DKSE) (for models 1A and 1B ), FE (for model 1C ), and two-step GMM (for model 1D ) estimation techniques to examine the impact of food insecurity on health outcomes. It also employs the Granger causality test. However, for robustness checks, it uses fully modified ordinary least squares (FMOLS), panel-corrected standard error (PCSE), and feasible generalized least squares (FGLS) methods (for models 1A and 1B ). Moreover, it uses a random effect (RE) for model 1C and panel dynamic fixed effect (DFE) techniques for model 1D .

Even though several panel estimation techniques allow CD, most of them – such as cross-section augmented autoregressive distributed lag (CS-ARDL), cross-section augmented distributed lag (CS-DL), common correlated effects pooled (CCEP), and common correlated effects mean group (CCEMG) estimators – require a large number of observations over groups and periods. Similarly, the continuously updated fully modified (CUP-FM) and continuously updated bias-corrected (CUP-BC) estimators are not coded in STATA. Others, like the PCSE, FGLS, and seemingly unrelated regression (SUR), are feasible for T (the number of time series) > N (the number of cross-sectional units) [ 114 , 115 ]. However, a DKSE estimate is feasible for N > T [ 114 ]. Therefore, depending on the CD, cointegration test, availability in STATA, and comparing N against T, this study mainly employs the DKSE regression for models 1A and 1B , FE model for model 1C , and GMM for model 1C .

Finally, to check the robustness of the main result, this study employs FMOLS, FGLS, and PCSE estimation techniques for models 1A and 1B . Furthermore, even though the Hausman test confirms that the FE is more efficient, the study employs the RE for model 1C . This is because Firebaugh et al. [ 116 ] note that the RE and FE models perform best in panel data. Besides, unlike FE, RE assumes that individual differences are random. In addition, this study uses panel DFE for model 1D (selected based on the Hausman test). Finally, the robustness check is also conducted using an alternative model (i.e., when a dependent variable is without a natural log and Granger causality test).

Table 3 shows the overall mean of LNLEXP of the region is 4.063 years which indicates that the region can achieve only 57.43 (using ln(x) = 4.063 = loge (x)  = e 4.063 , where e = 2.718) years of life expectancy. This is very low compared to other regions. Besides, the ranges in the value of LNLEXP are between 3.698 and 4.345 or (40–76 years), implying high variation. Similarly, the mean value of LNINFMOR is 3.969; implying SSA countries recorded 52 infants death per 1000. Moreover, the range of LNINFMOR is between 2.525 and 4.919 or (12 – 135 infant death per 1000), implying high variation within the region. The mean value of people’s prevalence for undernourishment is 21.26; indicating 21% of the population is undernourished. However, the mean value of AVRDES is 107.826, which is greater than 100, implying that the calorie supply is adequate for all consumers if the food is distributed according to the requirements of individuals. When we observe the skewness and kurtosis of the variables of the models, except for LNLEXP and LNINFMOR, all variables are positively skewed. In addition, all variables have positive kurtosis with values between 2.202 and 6.092.

Table 3 also shows the degree of relationship between variables, such that most values are below the threshold or rule of thumb (0.7) for a greater association [ 117 ]. However, the association between LNINFMOR and LNLEXP, as well as between PRUNP and AVRDES, is over the threshold and seems to have a multicollinearity issue. Nevertheless, these variables did not exist together in the models, indicating the absence of a multicollinearity problem.

Table 4 shows whether the cross-sectional specific and time-specific FE in extended models ( model 1A-1D plus Eq.  2 ) are valid. The result reveals that the null hypothesis of the captured unobserved heterogeneity is homogenous across the countries, and time is rejected at 1%, implying the extended models are correctly specified. Besides, to check the robustness of the two-way error component model relative to the pooled OLS estimator, this study conducted an additional poolability test. The result shows the null hypothesis that intercepts homogeneity (pooling) is rejected at a 1% level; thus, the FE model is most applicable, but the pooled OLS is biased.

Cooks D is an indicator of high leverage and residuals. The impact is high when D exceeds 4/N, (N = number of observations). A D > 1 implies a significant outlier problem. The Cooks D result of this study confirms the absence of outliers' problem (see supplementary file 1 ).

Normality, heteroscedasticity, serial correlation, and multicollinearity tests

The results in Table 5 indicate that the probability value of the joint test for normality on e and u are above 0.01, implying that the residuals are normally distributed. The heteroscedasticity results show that the probability value of the chi-square statistic is less than 0.01 in all models. Therefore, the null hypothesis of constant variance can be rejected at a 1% level of significance. In other words, the modified Wald test result for Groupwise heteroskedasticity presented in Table 5 , rejects the null hypothesis of Groupwise homoskedasticity observed by the probability value of 0.0000, which implies the presence of heteroscedasticity in the residuals. Similarly, all models suffer from serial correlation since the probability value of 0.0000 rejects the null hypothesis of no first-order serial correlation, indicating the presence of autocorrelation in all panel models. Finally, the multicollinearity test reveals that the models have no multicollinearity problem since the Variance inflation Factors (VIF) values are below 5.

Cross-sectional dependence test

Results in Table 6 strongly reject the null hypothesis of cross-sectional independence for models 1A – 1C . However, for model 1D , the study found mixed results (i.e., Pesaran [ 68 ] fails to reject the null hypothesis of no CD while Frees [ 72 ] strongly rejects it). Thus, to decide, this study employs the Friedman [ 71 ] CD test. The result fails to reject the null hypothesis of cross-sectional independence, implying that two out of three tests fail to reject the null hypothesis of cross-sectional independence in model 1D . Therefore, unlike others, there is no CD in model 1D (see Table 6 ).

Unit root tests

Table 7 shows that all variables are highly (at 1% level) significant either at level (I(0)) or first difference (I(1)), which implies all variables are stationary. In other words, the result fails to reject the null hypothesis of unit root (non-stationary) for all variables at a 1%-significance level, either at levels or the first differences. Thus, we might expect a long-run connection between these variables collectively.

Cointegration tests

The results in Table 8 show that both the Westerlund [ 98 ] and Banerjee and Carrion-i-Silvestre [ 104 ] cointegration tests strongly reject the null hypothesis of no-cointegration in models 1A and 1B . However, model 1C provides a mixed result, i.e. the Banerjee and Carrion-i-Silvestre [ 104 ] test rejects the null hypothesis of no cointegration, yet the reverse is true for the Westerlund [ 98 ] test. Therefore, this study conducted further cointegration tests for model 1C . Even though Westerlund and Edgerton [ 99 ] suffer from insufficient observation, it is based on the McCoskey and Kao [ 105 ] LM test [ 118 ]. Thus, we can use a residual-based cointegration test in the heterogeneous panel framework proposed by McCoskey and Kao [ 105 ]. However, an efficient estimation technique of cointegrated variables is required, and hence the FMOLS and DOLS estimators are recommended. The residuals derived from the FMOLS and DOLS will be tested for stationarity with the null hypothesis of no cointegration amongst the regressors. Since the McCoskey and Kao [ 105 ] test involves averaging the individual LM statistics across the cross-sections, for testing the residuals FMOLS and DOLS stationarity, McCoskey, and Kao [ 105 ] test is in the spirit of IPS (Im et al. [ 74 ]) [ 119 ].

Though FMOLS and DOLS are recommended for the residuals cointegration test, DOLS is better than FMOLS (for more detail, see Kao and Chiang [ 120 ]); therefore, this study uses a residual test derived from DOLS. The result fails to reject the null hypothesis of no cointegration. Two (Banerjee and Carrion-i-Silvestre [ 104 ] and McCoskey and Kao [ 105 ]) out of three tests fail to reject the null hypothesis of no cointegration; hence, we can conclude that there is no long-run relationship among the variables in model 1C .

Unlike other models, since there is CD in model 1D , this study employs the Pedroni [ 109 ] and Kao [ 111 ] cointegration tests for model 1D . The result strongly rejects the null hypothesis of no cointegration, which is similar to models 1A and 1B , that a long-run relationship exists among the variables in model 1D (see Table 5 ).

Panel data estimation results

Table 9 provides long-run regression results of all models employing appropriate estimation techniques such as DKSE, FE, and two-step GMM, along with the Granger causality test. However, the DKSE regression can be estimated in three ways: FE with DKSE, RE with DKSE, and pooled Ordinary Least Squares/Weighted Least Squares (pooled OLS/WLS) regression with DKSE. Hence, we must choose the most efficient model using Hausman and Breusch-Pagan LM for RE tests (see supplementary file 2 ). As a result, this study employed FE with DKSE for models 1A and 1B . Further, due to Hausman's result, absence of cointegration and to deal with heterogeneity and spatial dependence in the dynamic panel, this study employs FE for the model1C (see the supplementary file 2). However, due to the absence of CD, the presence of cointegration, and N > T, this study uses GMM for model 1D . Moreover, according to Roodman [ 121 ], the GMM approach can solve heteroskedasticity and autocorrelation problems. Furthermore, even though two-step GMM produces only short-run results, it is possible to generate long-run coefficients from short-run results [ 122 , 123 ].

The DKSE result of model 1A shows that a 1% increment in people's prevalence for undernourishment reduces their life expectancy by 0.00348 PPs (1 year or 366 days). However, in model 1C, a 1% rise in the prevalence of undernourishment increases infant mortality by 0.0119 PPs (1 year or 369 days). The DKSE estimations in model 1B reveal that people’s life expectancy rises by 0.00317 PPs with every 1% increase in average dietary energy supply. However, the GMM result for model 1D confirms that a 1% incrementin average dietary energy supply reduces infant mortality by 0.0139 PPs. Moreover, this study conducted a panel Granger causality test to confirm whether or not food insecurity has a potential causality to health outcomes. The result demonstrates that the null hypothesis of change in people’s prevalence for undernourishment and average dietary energy supply does not homogeneously cause health outcomes is rejected at 1% significance, implying a change in food insecurity does Granger-cause health outcomes of SSA countries (see Table 9 ).

In addition to the main results, Table 9 also reports some post-estimation statistics to ascertain the consistency of the estimated results. Hence, in the case of DKSE and FE models, the validity of the models is determined by the values of R 2 and the F statistics. For instance, R 2 quantifies the proportion of the variance in the dependent variable explained by the independent variables, representing the model’s quality. The results in Table 9 demonstrate that the explanatory variables explain more than 62% of the variance on the dependent variable. Cohen [ 125 ] classifies the R 2 value of 2% as a moderate influence in social and behavioral sciences, while 13 and 26% are considered medium and large effects, respectively. Therefore, the explanatory variables substantially impact this study's models. Similarly, the F statistics explain all independent variables jointly explain the dependent one. For the two-step system GMM, the result fails to reject the null hypothesis of no first (AR(1)) and second-order (AR(2)) serial correlation, indicating that there is no first and second-order serial correlation. In addition, the Hansen [ 126 ] and Sargan [ 127 ] tests fail to reject the null hypothesis of the overall validity of the instruments used, which implies too many instruments do not weaken the model.

Robustness checks

The author believes the above findings may not be enough for policy recommendations unless robustness checks are undertaken. Hence, the study estimated all models without the natural logarithm of the dependent variables (see Table 10 ). The model 1A result reveals, similar to the above results, individuals’ prevalence for undernourishment significantly reduces their life expectancy in SSA countries. That means a 1% increase in the people's prevalence of undernourishment reduces their life expectancy by 0.1924 PPs. Moreover, in model 1B , life expectancy rises by 0.1763 PPs with every 1% increase in average dietary energy supply. In model 1C , the rise in infants’ prevalence for undernourishment has a positive and significant effect on their mortality rate in SSA countries. The FE result implies that a 1% rise in infants’ prevalence for undernourishment increases their mortality rate by 0.9785 PPs. The GMM result in model 1D indicates that improvement in average dietary energy supply significantly reduces infant mortality. Further, the Granger causality result confirms that the null hypothesis of change in the prevalence of undernourishment and average dietary energy supply does not homogeneously cause health outcomes and is rejected at a 1% level of significance. This implies a change in food insecurity does Granger-cause health outcomes in SSA countries (see Table 10 ).

The study also conducted further robustness checks using the same dependent variables (as Table 9 ) but different estimation techniques. The results confirm that people’s prevalence of undernourishment has a negative and significant effect on their life expectancy, but improvement in average dietary energy supply significantly increases life expectancy in SSA countries. However, the incidence of undernourishment in infants contributes to their mortality; however, progress in average dietary energy supply for infants significantly reduces their mortality (see Table 11 ).

The main objective of this study is to examine the impact of food insecurity on the health outcomes of SSA countries. Accordingly, the DKSE result of model 1A confirms that the rise in people’s prevalence for undernourishment significantly reduces their life expectancy in SSA countries. However, the FE result shows that an increment in the prevalence of undernourishment has a positive and significant impact on infant mortality in model 1C . This indicates that the percentage of the population whose food intake is insufficient to meet dietary energy requirements is high, which leads to reduce life expectancy but increases infant mortality in SSA countries. The reason for this result is linked to the insufficient food supply in SSA due to low production and yields, primitive tools, lack of supporting smallholder farms and investment in infrastructure, and government policies. Besides, even though the food is available, it is not distributed fairly according to the requirements of individuals. Moreover, inadequate access to food, poor nutrition, and chronic illnesses are caused by a lack of well-balanced diets. In addition, many of these countries are impacted by poverty, making it difficult for citizens to afford nutritious food. All these issues combine to create an environment where individuals are more likely to suffer malnutrition-related illnesses, resulting in a lower life expectancy rate. The DKSE estimation result in model 1B reveals that improvement in average dietary energy supply positively impacts people's life expectancy in SSA countries. However, the improvement in average dietary energy supply reduces infant mortality.

Based on the above results, we can conclude that food insecurity harms SSA nations' health outcomes. This is because the prevalence of undernourishment leads to increased infant mortality by reducing the vulnerability, severity, and duration of infectious diseases such as diarrhea, pneumonia, malaria, and measles. Similarly, the prevalence of undernourishment can reduce life expectancy by increasing the vulnerability, severity, and duration of infectious diseases. However, food security improves health outcomes – the rise in average dietary energy supply reduces infant mortality and increases the life expectancy of individuals.

Several facts and theories support the above findings. For instance, similar to the theoretical and conceptual framework section, food insecurity in SSA countries can affect health outcomes in nutritional, mental health, and behavioral channels. According to FAO et al. [ 128 ], the prevalence of undernourishment increased in Africa from 17.6% of the population in 2014 to 19.1% in 2019. This figure is more than twice the global average and the highest of all regions of the world. Similarly, SSA is the world region most at risk of food insecurity [ 129 ]. According to Global Nutrition [ 130 ] report, anemia affects an estimated 39.325% of women of reproductive age. Some 13.825% of infants have a low weight at birth in the SSA region. Excluding middle African countries (due to lack of data), the estimated average prevalence of infants aged 0 to 5 months who are exclusively breastfed is 35.73%, which is lower than the global average of 44.0%. Moreover, SSA Africa still experiences a malnutrition burden among children aged under five years. The average prevalence of overweight is 8.15%, which is higher than the global average of 5.7%. The prevalence of stunting is 30.825%—higher than the worldwide average of 22%. Conversely, the SSA countries’ prevalence of wasting is 5.375%, which is higher than most regions such as Central Asia, Eastern Asia, Western Asia, Latin America and the Caribbean, and North America. The SSA region's adult population also faces a malnutrition burden: an average of 9.375% of adult (aged 18 and over) women live with diabetes, compared to 8.25% of men. Meanwhile, 20.675% of women and 7.85% of men live with obesity.

According to Saltzman et al. [ 17 ], micronutrient deficiencies can affect people’s health throughout their life cycle. For instance, at the baby age, it causes (low birth weight, higher mortality rate, and impaired mental development), child (stunting, reduced mental capacity, frequent infections, reduced learning capacity, higher mortality rate), adolescent (stunting, reduced mental capacity, fatigue, and increased vulnerability to infection), pregnant women (increased mortality and perinatal complications), adult (reduced productivity, poor socio-economic status, malnutrition, and increased risk of chronic disease), elderly (increased morbidity (including osteoporosis and mental impairment), and higher mortality rate).

Though this study attempts to fill the existing gaps, it also has limitations. It examined the impact of food insecurity on infant mortality; however, their association is reflected indirectly through other health outcomes. Hence, future studies can extend this study by examining the indirect effect of food insecurity on infant mortality, which helps to look at in-depth relationships between the variables. Moreover, this study employed infant mortality whose age is below one year; hence, future studies can broaden the scope by decomposing infant mortality into (neonatal and postnatal) and under-five mortality.

Millions of people are dying every year due to hunger and hunger-related diseases worldwide, especially in SSA countries. Currently, the link between food insecurity and health status is on researchers' and policymakers' agendas. However, macro-level findings in this area for most concerned countries like SSA have been given only limited attention. Therefore, this study examined the impact of food insecurity on life expectancy and infant mortality rates. The study mainly employs DKSE, FE, two-step GMM, and Granger causality approaches, along with other estimation techniques for robustness checks for the years between 2001 and 2018. The result confirms that food insecurity harms health outcomes, while food security improves the health status of SSA nations'. That means that a rise in undernourishment increases the infant mortality rate and reduces life expectancy. However, an improvement in the average dietary energy supply reduces infant mortality and increases life expectancy. Therefore, SSA countries need to guarantee their food accessibility both in quality and quantity, which improves health status. Both development experts and political leaders agree that Africa has the potential for agricultural outputs, can feed the continent, and improve socio-economic growth. Besides, more than half of the world's unused arable land is found in Africa. Therefore, effective utilization of natural resources is essential to achieve food security. Moreover, since the majority of the food in SSA is produced by smallholder farmers [ 131 ] while they are the most vulnerable to food insecurity and poverty [ 132 , 133 ]; hence, special focus and support should be given to smallholder farmers that enhance food self-sufficiency. Further, improvement in investment in agricultural research; improvement in markets, infrastructures, and institutions; good macroeconomic policies and political stability; and developing sub-regional strategies based on their agroecological zone are crucial to overcoming food insecurity and improving health status. Finally, filling a stomach is not sufficient; hence, a person's diet needs to be comprehensive and secure, balanced (including all necessary nutrients), and available and accessible. Therefore, SSA countries should ensure availability, accessibility, usability, and sustainability to achieve food and nutrition security.

Augmented Dickey–Fuller

Acquired Immunodeficiency Syndrome

Average Dietary Energy Supply

Common Correlated Effects Mean Group

Common Correlated Effects Pooled

Cross-Sectional Dependence

Cross-Sectionally Augmented Panel Unit Root Test

Cross-Section Augmented Autoregressive Distributed Lag

Cross-Section Augmented Distributed Lag

Continuously Updated Bias-Corrected

Continuously Updated Full Modified

Dynamic Fixed Effect

Driscoll-Kraay Standard Errors

Dynamic Ordinary Least Square

Error Correction Model

Food and Agricultural Organization

Fixed Effect

Feasible Generalised Least Squares

Fully Modified Ordinary Least Square

Gross Domestic Product (GDP) per capita

Generalised Method of Momentum

Domestic General Government Health Expenditure

Human Immunodeficiency Virus

Integration at First Difference

International Fund for Agricultural Development

Infant Mortality Rate

Im, Pesaran, Shin

Lag of Infant Mortality Rate

Lag of Natural Logarithm of Infant Mortality Rate

Life Expectancy at Birth

Levin, Lin, and Chu

Lagrange Multiplier

Natural Logarithm of Infant Mortality Rate

Natural Logarithm of Life Expectancy at Birth

Mean Years of Schooling

Ordinary Least Squares

Panel-Corrected Standard Error

Pooled Mean Group

Prevalence of Undernourishment

Random Effect

Sustainable Development Goals

Sub-Saharan African

Statistical Software

Seemingly Unrelated Regression


World Food Programme

World Health Organization

Weighted Least Squares

empirical research food

empirical research food

Step #5: Data Analysis and result

Data analysis can be done in two ways, qualitatively and quantitatively. Researcher will need to find out what qualitative method or quantitative method will be needed or will he need a combination of both. Depending on the unit of analysis of his data, he will know if his hypothesis is supported or rejected. Analyzing this data is the most important part to support his hypothesis.

Step #6: Conclusion

A report will need to be made with the findings of the research. The researcher can give the theories and literature that support his research. He can make suggestions or recommendations for further research on his topic.

Empirical research methodology cycle

A.D. de Groot, a famous dutch psychologist and a chess expert conducted some of the most notable experiments using chess in the 1940’s. During his study, he came up with a cycle which is consistent and now widely used to conduct empirical research. It consists of 5 phases with each phase being as important as the next one. The empirical cycle captures the process of coming up with hypothesis about how certain subjects work or behave and then testing these hypothesis against empirical data in a systematic and rigorous approach. It can be said that it characterizes the deductive approach to science. Following is the empirical cycle.

  • Observation: At this phase an idea is sparked for proposing a hypothesis. During this phase empirical data is gathered using observation. For example: a particular species of flower bloom in a different color only during a specific season.
  • Induction: Inductive reasoning is then carried out to form a general conclusion from the data gathered through observation. For example: As stated above it is observed that the species of flower blooms in a different color during a specific season. A researcher may ask a question “does the temperature in the season cause the color change in the flower?” He can assume that is the case, however it is a mere conjecture and hence an experiment needs to be set up to support this hypothesis. So he tags a few set of flowers kept at a different temperature and observes if they still change the color?
  • Deduction: This phase helps the researcher to deduce a conclusion out of his experiment. This has to be based on logic and rationality to come up with specific unbiased results.For example: In the experiment, if the tagged flowers in a different temperature environment do not change the color then it can be concluded that temperature plays a role in changing the color of the bloom.
  • Testing: This phase involves the researcher to return to empirical methods to put his hypothesis to the test. The researcher now needs to make sense of his data and hence needs to use statistical analysis plans to determine the temperature and bloom color relationship. If the researcher finds out that most flowers bloom a different color when exposed to the certain temperature and the others do not when the temperature is different, he has found support to his hypothesis. Please note this not proof but just a support to his hypothesis.
  • Evaluation: This phase is generally forgotten by most but is an important one to keep gaining knowledge. During this phase the researcher puts forth the data he has collected, the support argument and his conclusion. The researcher also states the limitations for the experiment and his hypothesis and suggests tips for others to pick it up and continue a more in-depth research for others in the future. LEARN MORE: Population vs Sample

LEARN MORE: Population vs Sample

There is a reason why empirical research is one of the most widely used method. There are a few advantages associated with it. Following are a few of them.

  • It is used to authenticate traditional research through various experiments and observations.
  • This research methodology makes the research being conducted more competent and authentic.
  • It enables a researcher understand the dynamic changes that can happen and change his strategy accordingly.
  • The level of control in such a research is high so the researcher can control multiple variables.
  • It plays a vital role in increasing internal validity .

Even though empirical research makes the research more competent and authentic, it does have a few disadvantages. Following are a few of them.

  • Such a research needs patience as it can be very time consuming. The researcher has to collect data from multiple sources and the parameters involved are quite a few, which will lead to a time consuming research.
  • Most of the time, a researcher will need to conduct research at different locations or in different environments, this can lead to an expensive affair.
  • There are a few rules in which experiments can be performed and hence permissions are needed. Many a times, it is very difficult to get certain permissions to carry out different methods of this research.
  • Collection of data can be a problem sometimes, as it has to be collected from a variety of sources through different methods.

LEARN ABOUT: Social Communication Questionnaire

Empirical research is important in today’s world because most people believe in something only that they can see, hear or experience. It is used to validate multiple hypothesis and increase human knowledge and continue doing it to keep advancing in various fields.

For example: Pharmaceutical companies use empirical research to try out a specific drug on controlled groups or random groups to study the effect and cause. This way, they prove certain theories they had proposed for the specific drug. Such research is very important as sometimes it can lead to finding a cure for a disease that has existed for many years. It is useful in science and many other fields like history, social sciences, business, etc.

LEARN ABOUT: 12 Best Tools for Researchers

With the advancement in today’s world, empirical research has become critical and a norm in many fields to support their hypothesis and gain more knowledge. The methods mentioned above are very useful for carrying out such research. However, a number of new methods will keep coming up as the nature of new investigative questions keeps getting unique or changing.

Create a single source of real data with a built-for-insights platform. Store past data, add nuggets of insights, and import research data from various sources into a CRM for insights. Build on ever-growing research with a real-time dashboard in a unified research management platform to turn insights into knowledge.



Identifying empirical articles.

  • Searching for Empirical Research Articles

What is Empirical Research?

An empirical research article reports the results of a study that uses data derived from actual observation or experimentation. Empirical research articles are examples of primary research. To learn more about the differences between primary and secondary research, see our related guide:

  • Primary and Secondary Sources

By the end of this guide, you will be able to:

  • Identify common elements of an empirical article
  • Use a variety of search strategies to search for empirical articles within the library collection

Look for the  IMRaD  layout in the article to help identify empirical research. Sometimes the sections will be labeled differently, but the content will be similar. 

  • I ntroduction: why the article was written, research question or questions, hypothesis, literature review
  • M ethods: the overall research design and implementation, description of sample, instruments used, how the authors measured their experiment
  • R esults: output of the author's measurements, usually includes statistics of the author's findings
  • D iscussion: the author's interpretation and conclusions about the results, limitations of study, suggestions for further research

Parts of an Empirical Research Article

Parts of an empirical article.

The screenshots below identify the basic IMRaD structure of an empirical research article. 


The introduction contains a literature review and the study's research hypothesis.

empirical research food

The method section outlines the research design, participants, and measures used.

empirical research food


The results section contains statistical data (charts, graphs, tables, etc.) and research participant quotes.

empirical research food

The discussion section includes impacts, limitations, future considerations, and research.

empirical research food

Learn the IMRaD Layout: How to Identify an Empirical Article

This short video overviews the IMRaD method for identifying empirical research.

Introduction: What is Empirical Research?

Empirical research is based on observed and measured phenomena and derives knowledge from actual experience rather than from theory or belief. 

How do you know if a study is empirical? Read the subheadings within the article, book, or report and look for a description of the research "methodology."  Ask yourself: Could I recreate this study and test these results?

Key characteristics to look for:

  • Specific research questions to be answered
  • Definition of the population, behavior, or phenomena being studied
  • Description of the process used to study this population or phenomena, including selection criteria, controls, and testing instruments (such as surveys)

Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components:

  • Introduction: sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous studies
  • Methodology: sometimes called "research design" -- how to recreate the study -- usually describes the population, research process, and analytical tools used in the present study
  • Results: sometimes called "findings" -- what was learned through the study -- usually appears as statistical data or as substantial quotations from research participants
  • Discussion: sometimes called "conclusion" or "implications" -- why the study is important -- usually describes how the research results influence professional practices or future studies

Reading and Evaluating Scholarly Materials

Reading research can be a challenge. However, the tutorials and videos below can help. They explain what scholarly articles look like, how to read them, and how to evaluate them:

  • CRAAP Checklist A frequently-used checklist that helps you examine the currency, relevance, authority, accuracy, and purpose of an information source.
  • IF I APPLY A newer model of evaluating sources which encourages you to think about your own biases as a reader, as well as concerns about the item you are reading.
  • Credo Video: How to Read Scholarly Materials (4 min.)
  • Credo Tutorial: How to Read Scholarly Materials
  • Credo Tutorial: Evaluating Information
  • Credo Video: Evaluating Statistics (4 min.)
  • Credo Tutorial: Evaluating for Diverse Points of View
Statement from Agriculture Secretary Tom Vilsack on the 2023 Household Food Security in the U.S. Report

WASHINGTON, Sept. 4, 2024 - Today, the U.S. Department of Agriculture’s Economic Research Service published its annual Household Food Security Report in the United States. The report shows that in 2023, while 86.5 percent of U.S. households were food secure throughout the entire year, the remaining 13.5 percent (18.0 million households) struggled with food availability, quality or variety at least some time during the year. Agriculture Secretary Tom Vilsack made the following statement regarding the report’s findings:

“The findings of today’s report are a direct outcome of Congressional actions that short-change our children’s future and erode the safety net that hard-working families rely on in hard times – whether that’s blocking expansion of the Child Tax Credit or doubling down to restrict access to the Supplemental Nutrition Assistance Program (SNAP). Policies like the expanded Child Tax Credit and Earned Income Tax Credit and enhanced SNAP benefits helped drive the poverty rate down to a record low of 8 percent in 2021. This is progress we should be working together to build on, not strip away.

Notably, food insecurity held steady from year to year among households with children. This tells us that programs to help feed kids work—including the National School Lunch Program, which gives many kids their healthiest meal of the day; the Special Supplemental Nutrition Program for Women, Infants and Children (WIC), which serves more than 6 million mothers, children and infants; and SNAP, which is our nation’s most powerful anti-hunger tool and one that more than half of food insecure households used last year—and they must be continued and strengthened. USDA will continue to encourage states to adopt the evidence-based SUN Bucks program, which launched in 2024 and helps feed kids in summer months when school is out and hunger rises. Bolstering participation in SNAP and fully funding WIC are also critical. The costs of not doing so are clear—we owe it to the next generation to give them the best possible start in life.

For anyone to go hungry in America is unacceptable. This report reaffirms that proposals to cut food assistance—including SNAP in the next Farm Bill—are misguided and out of step with the reality working families face.”

USDA is an equal opportunity provider, employer, and lender.

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

What motivates consumers to buy organic foods? Results of an empirical study in the United States

Raghava r. gundala.

1 Department of Business, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America

Anupam Singh

2 Faculty of Economic Sciences and Management, Nicolaus Copernicus University, Torun, Poland

Associated Data

All relevant data are within the manuscript and its Supporting Information files.

Consumers perceive organic foods as more nutritious, natural, and environmentally friendly than non-organic or conventional foods. Since organic foods developed, studies on consumer behavior and organic foods have contributed significantly to its development. The presesent study aims to identify the factors affecting consumer buying behaviour toward organic foods in the United States. Survey data are collected from 770 consumers in the Midwest, United States. ANOVA, multiple linear regression, factor analysis, independent t-tests, and hierarchical multiple regression analysis are used to analyze the collected primary data. This research confirms health consciousness, consumer knowledge, perceived or subjective norms, and perception of price influence consumers’ attitudes toward buying organic foods. Availability is another factor that affected the purchase intentions of consumers. Age, education, and income are demographic factors that also impact consumers’ buying behavior. The findings help marketers of organic foods design strategies to succeed in the US’s fast-growing organic foods market.


What is organic food.

Foods that are cultivated without the application of chemical pesticides can be called organic foods [ 1 ]. The feed cannot include antibiotics or growth hormones for the food products labeled organic for foods derived from animals (e.g., eggs, meat, milk, and milk products) [ 2 ]. Organic foods are perceived as environmentally safe, as chemical pesticides and fertilizers are not used in their production. They also are not grown from genetically modified organisms. Furthermore, organic foods are not processed using irradiation, industrial solvents, or synthetic food additives [ 3 ]. Thus, these foods are considered environmentally safe, as they are produced using ecologically sound methods.

When the world’s population was low, almost all agriculture was primarily organic and near-natural. However, these traditional practices, passed from one generation to the next, did not produce enough food to meet the rapidly increasing global population’s demands. This led to the "green revolution," in which farmers used technological interventions to maximize outputs to meet the growing need for food for the increasing population [ 4 ]. Unfortunately, this increased food production also increased chemical pesticides and fertilizers, causing environmental and health issues.

Consumers worldwide are now more concerned with the environment [ 5 ]. They are sensitive to information about products, processing, and brands that might impact the environment [ 6 ]. Environmental issues are perceived as having a more direct impact on consumers’ well-being. Consumers who know environmental degradation activities are willing to buy organic foods [ 7 ].

Heightened awareness of the environment and the consumer’s desire to buy organic foods leads to increased corporate investment toward organic food production and marketing. They are thus initiating significant innovations in the organic food industry [ 8 ]. As a result, the organic food market is increasing [ 9 ]. In addition, effective campaigns create awareness about the environment. Because of these effective campaigns, consumers are now ready to spend more on green products [ 10 ].

Furthermore, people’s living standards have significantly improved in the past few decades. With these improvements, the demand for better lifestyles and food has also increased. The steady growth in purchases of organic foods is an emerging trend. Consumers want to learn what organic foods offer before purchasing decisions [ 11 ].

Global organic food market

According to a recent report, the organic food market is expected to grow with a Compound Annual Growth Rate (CAGR) of 16% during 2015–2020. This growth might be due to consumers’ health concerns as they become aware of organic foods’ perceived health benefits. Further, rising income levels, changes in living standards, and government initiatives encourage the broader adoption of organic products [ 12 ].

Organic food market in the USA

In 2018, organic market sales were US$47.86 billion, and the market grew by 6.3% from 2017 to 2018 [ 13 ]. In 2017, the organic food market in the United States hit a record of US$45.2 billion in sales; this market consists of both the organic food market and the organic non-food market (see Fig 1 ). It is predicted that the organic food market will grow at a consistent pace as it matures. The demand for organic foods is flourishing as consumers seek nutritious and clean eating, which they perceive as suitable for their health and the environment.

An external file that holds a picture, illustration, etc.
Object name is pone.0257288.g001.jpg

Source: Statista.com .

Understanding consumer buying behavior toward organic foods is essential to pursue better marketing and management of the market. This can help us learn about the consumer decision-making process on organic foods and understand how consumers’ attitudes and beliefs impact their consumption patterns. In addition, studying consumers’ willingness to pay a premium price and their response to organic food advertisements [ 14 ] is necessary for companies to succeed in this growing market.

This study focuses on exploring the factors influencing consumers’ buying behavior of organic foods. Although many factors can affect consumer buying behavior, we chose health consciousness, knowledge, subjective norms, price, and availability for this study based on Singh & Verma’s [ 1 ] study. Understanding these factors is vital for developing marketing strategies for successfully marketing of these products.

Theory and research hypotheses

Earlier research in the area of consumer buying behavior of organic foods discussed reasons why people buy. Even though there are some differences, the main reasons are product quality, concerns related to environmental degradation, and health-related issues [ 15 ]. Subsequent studies on consumer buying behavior of organic foods confirmed this [ 16 ]. Consumers tend to perceive organic foods as being healthier than conventional alternatives. This perception of organic foods is one of the most commonly cited reasons for purchasing them. In two studies [ 17 , 18 ], it became evident that consumers tend to have a positive attitude toward organic foods. However, they may not be purchasing organic foods due to environmental concerns. Instead, purchasing decisions are driven by the perceived health benefits the foods offer, the desire to fit in with a social group, try a new trend, or differentiate themselves from others [ 19 ].

Health consciousness (HEC)

Consumer attitudes are significantly influenced by their health consciousness [ 20 ]. Consumers mainly purchase organic foods due to health benefits [ 21 ]. Several studies show that health factors significantly influence consumers’ willingness to buy organic foods [ 22 – 26 ]. One of the significant reasons that influence consumers could be the deterioration of their health [ 22 ]; thus, consumers see consumers’ purchases as an investment for good health. Bourn and Prescott [ 27 ] found that organic foods have a competitive advantage over conventional foods due to organic foods’ nutritive attributes.

However, in a study conducted by Fotopoulos and Krystallis [ 28 ], taste is also another reason consumers buy organic foods. Even though many studies said that the perceived health benefits are the primary motivator, work by Tarkiainen and Sundqvist [ 29 ] and Michaelidou and Hassan [ 25 ] did not find it to be a compelling driver. In the earlier studies, the health benefit is the least significant influencer on organic foods. We examined our respondents’ thoughts on this topic with these different findings on the importance of health benefits. Based on the above, we formulated Hypothesis 1:

  • H1: Health consciousness has a positive impact on buying behavior toward organic foods.

Consumer knowledge (CK)

The Theory of Reasoned Actions (TRA) supports our understanding of consumer behavior development by exploring the motivational influences on how consumers behave [ 30 ]. TRA offers a basis for predicting consumer attitudes and behavior [ 31 ]. Liu [ 20 ] further confirms that TRA is the best theory to predict consumer behavior about organic foods. Consumers want to be aware of what they are buying and satisfy their needs and wants. Therefore, knowledge is essential in impacting consumer behavior on foods.

Sapp [ 32 ] argued that knowledge involves a cognitive learning process. Consumer purchase intentions differ based on the consumers’ levels of expertise [ 33 ]. Consumers’ purchase of organic products cannot be separated from their knowledge and understanding of organic foods [ 34 , 35 ]. Recent research on consumer awareness and knowledge about organic foods found that consumer awareness worldwide is low relative to Europe’s awareness level. This elevated awareness about organic food is due to its market, which is well developed compared to the rest [ 3 , 36 – 39 ].

Studies also found that consumers’ knowledge about what is "organic" is inconsistent. For example, in one study, respondents assumed that organic foods are produced without pesticides, fertilizers, or growth regulators [ 40 ]. However, in a similar study done in the UK by Hutchings and Greenhalgh [ 41 ], respondents thought that "organic" farming is free from chemicals and is grown naturally. Further, respondents felt that organic foods are not intensively farmed.

In consumer purchase decisions of organic foods, awareness and knowledge about these products are essential. Smith and Paladino [ 42 ] conducted a study on factors affecting organic foods’ purchasing behavior. They found that learning about social and environmental issues will positively impact consumers’ purchase behavior. However, from the above, it is evident that consumers’ knowledge about organic foods is inconsistent. While they are likely to perceive that organic foods are pure, natural, and healthy, this perception might be based on their belief that organic foods are free from pesticides and chemical fertilizers. To evaluate the same, we proposed Hypothesis 2 as:

  • H2: Consumer buying behavior is positively associated with consumer knowledge of organic foods.

Perceived or subjective norms (PSN)

Ajzen [ 43 ] defines perceived or subjective norms (PSN) as "a perceived social pressure to perform or not to perform a behavior." Finlay et al. [ 44 ] said subjective norms are individuals’ perceptions or opinions about what others believe the individual should do. Subjective norms had an impact on consumer purchase behavior in the research conducted by Shimp and Kavas [ 45 ], Sheppard et al. [ 46 ], and Bagozzi et al. [ 47 ]. Chang [ 48 ] tested the correlation between attitudes toward consumer behavior and subjective norms. This study also examined the link between norms and attitudes and found that subjective norms lead to behavior attitudes in a meaningful manner. From the above, we formulated Hypothesis 3 as:

  • H3: Perceived or subjective norms will positively influence consumer buying of organic foods.

Perception of the price (PP)

Organic foods are priced higher than conventional foods. Aertsens et al. [ 49 ] and Hughner et al. [ 16 ] confirmed that price is a significant barrier to organic food choice. Padel and Midmore [ 50 ] and O’Doherty et al. [ 51 ] indicate that high prices are likely to impede future demand development; thus, price is crucial in organic food marketing. The research confirmed that consumers switch products due to high prices [ 52 ], and Gan et al. [ 53 ] found that higher costs hurt the chances of buying organic foods. However, Radman [ 54 ] concluded that some consumers have a positive attitude toward organic foods and are willing to pay a higher price. Meanwhile, Smith et al. [ 55 ] found that price does not significantly impact organic food purchases. Since there are contradictory findings on the relationship between price and organic foods, we decided to explore whether consumer perceptions of cost have any link to their buying behavior of organic foods, as stated in Hypothesis 4:

  • H4: Perceived price of organic foods is negatively associated with consumer buying.

Availability of organic foods (AV)

Availability is one factor that encourages the purchase of organic foods [ 56 ]. Makatouni [ 24 ] reiterated that organic foods’ availability could be a barrier to consuming the same. In a study by Tarkiainen and Sundqvist [ 29 ], the authors showed that the easy availability of organic foods positively affected their purchase behavior. In a survey conducted by Young et al. [ 57 ], consumers prefer readily available products. Therefore, they do not want to spend time searching for organic products.

However, recently, retailers across the country have noticed the growing popularity of organic foods and have been adding organic foods to their shelves. Increased organic foods marketing by large retail outlets and specialty stores has made organic foods accessible to more consumers [ 58 ]. This discussion poses a question. Does availability have a positive impact on purchase behavior? We decided to test this using Hypothesis 5:

  • H5: Availability of organic foods increases consumer buying behavior.

Purchase intention and actual buying behavior (PI and AB)

Planned behavior theory suggests that a reaction is a function of intentions and perceived behavioral control. Sheppard et al. [ 31 ] showed evidence that a relationship exists between choices and actions in different buying behavior types. Ajzen [ 43 ] stated that intentions or willingness could significantly predict actual buying behavior. Studies by Tanner and Kast [ 59 ] and Vermeir and Verbeke [ 60 ] found discrepancies between consumers who expressed favorable attitudes and actual purchase behavior. Hughner [ 16 ] found that, even though consumers have a positive attitude toward purchasing organic foods, very few people bought them. Based on the above, researchers believe that there is a relationship between attitudes and actions. This is in line with the study of Wheale and Hinton [ 61 ]. Attitudes toward organically grown food products might positively and significantly affect purchase behavior [ 62 ]. From this, it is assumed that the purchase of organic food results from an intent to purchase.

The attitude-behavior gap is a gap in consumers’ favorable attitude and actual purchase behavior of organic foods. This gap suggests that a positive attitude toward organic products might not always lead to a purchase. Many factors could influence this gap. Price, availability, and social influence, among many others, can create a discrepancy among consumer attitudes, purchase intentions, and actual buying behavior. We test the effects of influencing factors (HEC, CK, PSN, PP, AV) on purchase intent (PI) and actual buying behavior (AB).

  • H6: Consumer attitudes toward organic foods mediate the association between influencing factors and purchase intention.
  • H7: Consumer attitudes and purchase intentions mediate the association between influencing factors and actual buying behavior.

Sociodemographic factors

Behavior is not influenced by attitudes alone; many factors influence behavior. For example, Voon et al. [ 62 ] found that sociodemographic factors influence buying behavior. One significant factor is gender. For instance, Lockie et al. [ 63 ] confirm that women are more likely to have positive attitudes than men toward organic foods. Similarly, adolescent girls are more favorable than boys toward organic products [ 64 ].

Research has found that age also influences the purchase of organic foods. For example, Misra et al. [ 65 ] show that older individuals may be willing to buy organic foods due to health-related reasons. However, Cranfield and Magnusson [ 66 ] found that younger consumers are more likely to pay over 6% higher premiums to ensure that food products are pesticide-free. In addition, Rimal et al. [ 67 ] found that older individuals are less likely to buy organic foods than younger individuals. In contrast, younger people and women consider organic foods more essential and include them in their purchases [ 68 , 69 ].

In consumers’ demographic characteristics, income is another factor considered crucial for influencing the purchase of organic food. In two studies conducted by Govidnasamy and Italia [ 68 ] and Loureiro et al. [ 70 ], organic products are more frequently purchased by higher-income households. Likewise, Voon et al. ’s [ 62 ] research found that household income positively relates to organic food purchases. Further, women in the 30–45, with children and having a higher disposable income, include organic foods in their purchases [ 58 ].

Research by Cunningham [ 38 ] and O’Donovan and McCarthy [ 71 ] found a positive relationship between organic foods and education consumption. This is also true of Dettmann and Dimitri’s [ 58 ] work. According to their study, individuals with a higher education level are more likely to purchase organic foods than those with a lower education level. This was also discovered by Aryal et al. [ 72 ]. They showed that education is another factor that might influence the purchase of organic products.

Contrary to the above-referred research, some studies found a negative correlation [ 73 , 74 ]. These negative correlations are also confirmed by the analysis of Arbindra et al. [ 75 ]. They explain that organic food purchase patterns and levels of education are statistically significant.

Since there are different findings in the literature, we test the influence of demographic factors on buying, and the following hypotheses are formulated:

  • H8a: The age of the consumer and buying behavior toward organic foods are significantly different.
  • H8b: Gender and buying behavior toward organic foods are significantly different.
  • H8c: Income and buying behavior toward organic foods are significantly different.
  • H8d: Education and buying behavior toward organic foods are significantly different.

Research method

Primary data were collected using a questionnaire developed from prior studies [ 1 , 76 – 80 ]. The questionnaire has two sections. The first section contains questions about organic product purchase behavior, with responses measured on a 5-point Likert scale. The second section includes questions on respondents’ demographic information (see S1 Appendix ).

The questionnaire was pilot tested on 50 respondents to ensure question and response clarity. Changes were made where necessary based on the feedback of the pilot study. Convenience and snowball sampling methods were used. Online surveys were conducted by sending out the surveys to individuals known to both the researcher and the students taking a Market Research course during Spring 2019. These individuals were asked to pass on the survey to their friends and family members. The snowball sampling method was used to generate as many responses as possible during May-August 2019. Respondents were asked to participate in the study via email. The email sent to potential participants indicated that they voluntarily agreed to participate in the survey by clicking on the survey link. The email also mentioned that, at any time, they could stop participating by merely closing the browser, and their responses will not be saved. A total of 770 responses were received. After going through the questionnaires for completeness, a total of 502 surveys were used for further analysis. The study is approved by the Institutional Review Board of the University of Wisconsin-Stout as this involves a survey from the consumers based on their consents. Further, the data were analyzed anonymously.

Results and discussion

The respondents’ demographic profile is reported in Table 1 . The table indicates that 58% of the respondents are men, while the remaining 42% are women. The plurality (37%) of the respondents is 31–40 years old. Likewise, most (35%) are graduate students, followed by undergraduate students (28%) and postgraduates/Ph.D. (21%). The analysis also shows that respondents’ plurality has an annual income of over $100,000. The highest proportion of respondents (38%) has a family size of 1–2 members living in their households. This family size is closely followed by 3–4 people in the household (37%).

CharacteristicN (%)CharacteristicN (%)
Male291 (58)Less than $40,00045 (9)
Female211 (42)$40,001 to $60,00080 (16)
$60,001 to $80,000116 (23)
$80,001 to $100,000126 (25)
18–30 years105 (21)above $100,000135 (27)
31–40 years186 (37)
41–50 years116 (23)
51–60 years60 (12)1–2191 (38)
Above 60 years35 (7)3–4186 (37)
5 or more125 (25)
High school80 (16)
Undergraduate141 (28)Student55 (11)
Graduate176 (35)Work full-time186 (37)
Postgraduate/Ph.D.105 (21)Self-employed171 (34)
Retired90 (18)

Reasons for purchase of organic foods

Respondents were asked if they have ever bought organic food products, and 55.6% said yes. Then, these respondents were asked further questions about their purchases. When asked about the purchase frequency, 51.8% said they purchase organic food products weekly, 26% purchase at least once a month, and the remaining 21.6% purchase less frequently than once a month.

Respondents mentioned health consciousness as the primary reason for purchasing organic food. Further, non-use of pesticides, lower pesticide residues, environmentally friendly production, and perceived freshness are other reasons respondents choose to buy organic foods (see Fig 2 ). Health consciousness played an essential role in 48% of respondents, followed by pesticide-free (19%) and environmentally friendly (15%) considerations.

An external file that holds a picture, illustration, etc.
Object name is pone.0257288.g002.jpg

To identify the factors influencing attitudes toward organic foods, Principal Components Analysis (PCA) using varimax rotation is conducted. Before applying the factor analysis, the Kaiser-Mayer-Olin (KMO) test and Bartlett’s test of sphericity are used to test data suitability. The result shows the KMO measure of sampling adequacy as 0.82. Thus, the value exceeds the cut-off value of 0.60. Bartlett’s test of sphericity (χ 2 = 2,082, df = 132, p < .001) is also significant. This indicates that the inter-item correlations are significant for PCA. KMO and Bartlett’s test results support the data [ 81 ]. The results are shown in Table 2 to ensure scale reliability. Each factor has a Cronbach’s alpha (α) value higher than the threshold value of 0.70 [ 82 ].

ConstructIndicatorFactor Loadings (λ)Cronbach’s αVariance (%)
Health Consciousness0.7837.12
Consumers’ Knowledge0.797.63
Perceived or Subjective Norms0.875.60
Perception Price0.845.38
Availability of Organic Foods0.703.76
Purchase Intention0.792.13
Actual Buying Behavior0.821.69

Multiple linear regression analysis is performed to test hypotheses H1–H5. The analysis ascertains the impact of health consciousness, consumers’ knowledge, perceived or subjective norms, availability, and perception of the price on consumer attitude (AT). As shown in Table 3 , HEC, CK, PSN, PP, and AV account for 33% of the explained variances (F (5, 177) = 32.51, p < .001, R 2 = 0.33).

PredictorMinMaxMeanSDΒRegression AnalysisCollinearity

Notes: R 2 = 0.33, F (5, 177) = 32.51

According to the results, the H1(β = 0.37, p = .016); H2 (β = 0.47, p < .001); H3 (β = 0.34, p = .015); and H4 (β = 0.36, p = .001) are supported, as the β values are positive and significant. However, the values for H5 (β = 0.29, p = .117) are statistically non-significant. This shows that H5 is not supported. The findings confirmed that health consciousness, consumer knowledge, perceptive or subjective norm, and perception of the price affect respondents’ attitudes toward organic foods. However, it is also found that availability has no impact on consumers’ attitudes, at least in our sample.

The hierarchical regression method was applied to test the association between purchase intention and influencing factors (HEC, CK, PSN, PP, and AV) via the mediation of AT. The mediation was ascertained using Baron and Kenny’s [ 83 ] approach. Certain criteria must be met to declare the presence of mediation in the equation. The first necessary criterion is that the independent variable (IV) must affect the dependent variable (DV). The second criterion is that the IV must significantly influence the mediating variables. The third suggests the mediating variables must affect the DV. When all of the above conditions are met, a full mediation is confirmed if the IV no longer affects the DV after the mediator has been controlled for. Partial mediation occurs when the effect of the IV on the DV is reduced after the mediators are controlled for. The results indicate that all β values (for the effect on AT) are positive and significant: HEC (β = 0.17, p < .001), CK (β = 0.29, p < .030), PSN (β = 0.33, p < .020), PP (β = 0.39, p < .010), and AV (β = 0.24, p < .050; see Table 4 ). The presence of mediation is also confirmed, as Baron and Kenny’s criteria are met. Thus, H6, which predicts that the attitude mediates the relationship between the influencing factors and PI, is supported.

Predictor Step 1 Step 2Collinearity

According to the results reported in Table 5 , H7—which states that influencing factors have a positive effect on actual buying behavior via the mediating effect of attitude and purchase intention—is supported: AT (β = 0.24, p < .040) and PI (β = 0.26, p < .020). This confirms that AT and PI have a positive and significant effect on consumers’ actual buying behavior. Furthermore, AT and PI mediate the association between influencing factors and AB since the values of the corresponding regression coefficients of HEC, CK, PSN, PP, and AV are reduced when the effects of AT and PI are controlled for. These results support H7.

Step 1Step 2Step 3Collinearity
0.274.28 0.635.54 0.225.44 0.452.22
0.353.25 0.444.77 0.434.22 0.352.86
0.394.18 0.342.33 0.334.19 0.412.44
0.485.77 0.555.21 0.456.75 0.392.56
0.430.33 0.274.81 0.337.67 0.432.33
0.465.23 0.245.39 0.492.04
0.267.67 0.342.94

Notes: * p < .05;

** p < .001

Demographic differences in the actual buying behavior

An independent t-test is conducted to see if the actual purchase behavior changes are due to gender. Levene’s test ( Table 6 ) indicates that the p-value for gender is more significant than .05. The result confirms the homogeneous variance. Thus, the t-test is suitable for equal variance. Furthermore, the t-value of 0.08 (two-tailed) is higher than the significance level, suggesting a non-significant difference, implying that the mean values (-0.19 and -0.16) are not significant, supporting H8a.

Levene’s Test for Equality of VariancesT-Test for Equality of Means
FSig.TDfSig. (2-tailed)Mean DifferenceStd. Error Difference95% Confidence Interval of the Difference
Equal Variances2.61.153-1.21500.081-.19.07-.32-.01
Equal Variances not assumed-1.08472.55.069-.16.08-.32-.01

Table 7A below shows the results of the one-way ANOVA test. The findings suggest that respondents’ age (F = 7.01; p = .023) has a statistically significant effect on the purchase intention; thus, H8b is supported. However, further analysis of the respondents’ age groups is conducted using the least significant difference (LSD) test. The results of the LSD test, as depicted in Table 7B , indicate that the age group of 41–50 years has a statistically higher score than other age groups.

A. Age groups: ANOVA test. B. LSD test for respondent’s age groups.

 Between Groups7.1843.367.01.023
 Within Groups156.13498.41
 Actual Buying Behavior18–30 years0.30.020
41–50 years31–40 years0.21.000
51–60 years0.32.000
above 60 years0.43.000

1. p-values are rounded off to three decimal places.

2. Statistical significance is tested at p < 0.05.

Hypothesis H8c is supported, as the ANONA test reveals that annual income (F = 8.22; p = .011) significantly affects purchase intention (see Table 8A ). Further, the LSD Test for income ( Table 8B ) implies that the income level of more than US$80,000 has a higher score on the actual purchase as compared to those with incomes lower than US$80,000.

A. Annual income: ANOVA test. B. Annual income: LSD test.

Between Groups15.1054.228.22.011
Within Groups144.864970.35
 Actual Buying Behavior40,001 to 60,000Less than 40,0000.47.000
40,001 to 60,0000.43.011
80,001 to 100,0000.37.023
80,001 to 100,000Less than 40,0000.61.042
40,001 to 60,0000.44.000
60,001 to 80,0000.41.000
above 100,0000.42.046
above 100,000Less than 40,0000.60.018
40,001 to 60,0000.53.000
60,001 to 80,0000.47.000
80,001 to 100,0000.41.030

According to Table 9A , the level of education (F = 7.05; p = .001) affects consumer purchase behavior toward organic foods. The LSD test ( Table 9B ) further clarifies that consumers hold postgraduate/Ph.D. Degrees have a higher score on the AB of organic food products than consumers with only a high school diploma or undergraduates. The test also shows that graduate degree-holders are more likely to purchase organic food than any other group.

A. Education levels: ANOVA Test. B. Education levels: LSD Test.

Between Groups11.8143.787.05.001
Within Groups137.314980.39
 Actual Buying BehaviorGraduateHigh School0.40.000
Postgraduate/Ph.D.High School0.53.000


This study tested Singh and Verma’s [ 1 ] model on US consumers. We initially investigated the factors influencing consumer attitudes. Then we studied how these influencing factors and attitudes together affect the actual buying behavior of consumers. There has always been a debate on consumers’ intention to purchase compared to their actual purchase. Evidence of previous studies suggests that actual purchase behavior is not always the consequence of intent to purchase. Consumers sometimes intend to buy but often fail to do so. Therefore, this study also looked at the impact of demographic variables (such as gender, income, education, and age) on the consumers’ actual buying. This study confirms that all five factors—namely, health consciousness, consumer knowledge, availability, perception of price, and subjective norms—influence consumer attitudes. In contrast, attitudes and purchases were found to have mediating roles between influencing factors and actual buying behavior toward organic foods.

Further, the t-tests and ANOVA test results explored a more in-depth understanding of the relationships between demographic factors and actual buying. LSD tests were conducted to understand which sub-group in a demographic variable is significantly different from its counterparts. The findings of this study suggest that gender does not affect the actual buying of organic foods. Meanwhile, income, age, and education do affect consumers’ actual purchases. Furthermore, the LSD test shows that 41–50 years of age, consumers are more likely to buy organic foods than those in other groups. Not surprisingly, income is found to be another critical determinant of actual buying decisions. This may indicate that income is directly proportional to organic food buying (i.e., the higher the income level, the more likely the consumer is to buy organic foods). The findings also indicate the same trend with education. Higher levels of education correspond to a higher likelihood of purchasing organic foods. This could be because education might increase the consumer’s knowledge, and informed consumers could be health-conscious and aware of organic foods’ benefits. Many studies have stated different reasons for buying organic foods in developed and developing countries. However, if we compare and contrast our research findings with recent work in developed countries, similar results have been obtained. Health consciousness, food safety, environmentally friendly procedures, consumer’s knowledge on organic foods, perceived or subjective norms, availability of organic foods, and demographic factors, like gender, education, and income are the most substantial reasons for buying organic food, irrespective of the country (developed or developing; [ 1 , 3 , 25 ].


The findings of this research may guide companies dealing with organic foods. The study suggests the companies can craft marketing strategies to increase consumers’ awareness of the benefits of organic food consumption. Providing additional information about the benefits of organic food products may help convince consumers to make the purchase. This study will be helpful to retailers to segment their consumers based on their demographics. The study will also help retailers understand the factors that are likely to influence consumers’ organic food purchases and design strategies to increase their sales. Since availability (access) is one factor in buying decisions, retailers should reach out to local shops/areas to enhance market coverage. As subjective norms are another significant factor, marketers should promote organic food consumption through family, celebrities, and society.

This study offers important implications but with some limitations. First, direct factors related to consumer purchase decisions were measured. The second limitation is the sampling. Since the data is collected using an online survey forwarded by students and researchers to others, it could constitute snowballing. Any data collected using snowballing should be cautiously used to generalize the outcomes. Further research in this area may consider advertisements, federal and state regulations, and consumption patterns of organic foods. Of course, in organic food consumption, more studies in different regions with a higher sample size would validate our findings.

Covid-19 pandemic crisis affecting all aspects of the population’s daily life, in particular, dietary habits [ 85 ]. However, Covid-19 perceptions on adopting healthy food habits are not investigated in the present study. Any further research in this area should consider post-pandemic behavior. Recent studies suggest that parental attitudes affects dietary habits [ 84 – 86 ]. Therefore, future research should also consider how parental attitudes influence the purchase of organic foods.

Supporting information

S1 appendix, funding statement.

The author(s) received no specific funding for this work.

Data Availability

  • PLoS One. 2021; 16(9): e0257288.

Julio R. Flamini, M.D./Clinical Integrative Research Center of Atlanta MARCS-CMS 691123 — August 20, 2024

5887 Glenridge Drive, Suite 140 Atlanta , GA 30328 United States

United States

Dear Dr. Flamini:

This Warning Letter informs you of objectionable conditions observed during the U.S. Food and Drug Administration (FDA) inspection conducted at your clinical site between February 28 and March 11, 2024. Investigator Koffi Amegadje, representing FDA, reviewed your conduct of the following clinical investigations:

  • Protocol (b)(4) , “ (b)(4) ,” of the investigational drug (b)(4) , performed for (b)(4)

This inspection was conducted as a part of FDA’s Bioresearch Monitoring Program, which includes inspections designed to evaluate the conduct of research and to help ensure that the rights, safety, and welfare of human subjects have been protected.

At the conclusion of the inspection, Investigator Amegadje presented and discussed with you the Form FDA 483, Inspectional Observations. We acknowledge receipt of your March 25, 2024, written response to the Form FDA 483.

From our review of the FDA Establishment Inspection Report, the documents submitted with that report, and your March 25, 2024, written response, it appears that you did not adhere to the applicable statutory requirements in the Federal Food, Drug, and Cosmetic Act (FD&C Act) and applicable regulations contained in Title 21 of the Code of Federal Regulations, part 312 (21 CFR 312), governing the conduct of clinical investigations and the protection of human subjects. We wish to emphasize the following:

You failed to ensure that the investigation was conducted according to the investigational plan [21 CFR 312.60].

As a clinical investigator, you are required to ensure that your clinical studies are conducted in accordance with the investigational plan. The investigational plan for Protocol (b)(4)  required the investigational drug (b)(4) to be administered to subjects according to their weight and in accordance with the protocol’s titration schedule for the 20-week Titration Phase. During the 20-week Titration Phase, the protocol required a daily dose of  (b)(4) , with the dose of (b)(4) to increase according to the titration schedule at each study visit, which occurred every two weeks. According to the protocol’s titration schedule, subjects at Visit 2 who were 12 to <18 years old were to receive an (b)(4) of (b)(4) at a daily dose of 0.2 mg/kg, not to exceed 12.5 mg per day.

You failed to adhere to these requirements. Specifically, at Visit 2 (on May 15, 2023), Subject  (b)(6) , a 15-year-old female, weighed 59.1 kg. According to the protocol, this subject should have received a dose of 1.2 mL (~12 mg) per day. However, on May 16, 2023, Subject (b)(6) was dispensed a bottle of (b)(4) of (b)(4)  with a dosing instruction of 12.0 mL per day, and she received a total dose of 120 mg of  (b)(4) per day for 7 days (from May 17, 2023, to May 23, 2023). As a result, Subject  (b)(6) received doses of (b)(4) that exceeded the weight-based dose required by the protocol-specified titration schedule. Subject (b)(4) received approximately 10 times the maximum daily dose and was exposed to an increased risk of adverse events, such as Drug Reaction with Eosinophilia and Systemic Symptoms syndrome.

In your March 25, 2024, written response to the Form FDA 483, you stated that no adverse events were reported during or following this incident. You stated that once the site became aware of the overdose, immediate action was taken to address safety concerns. The subject’s Legally Authorized Representative was promptly contacted and confirmed that the subject was doing well, without experiencing any adverse events. You stated that instructions were provided to reduce the drug dosage to the correct amount, and an unscheduled visit was conducted the following day to ensure compliance. You further stated that at the time of the occurrence, no electronic dose calculator was provided by the sponsor, and the electronic dispensing algorithm lacked safety guards to prevent errors.

As a Corrective and Preventive Action plan (CAPA), you stated that a Standard Operating Procedure (SOP) has been established to address medication dispensation procedures, which includes components such as: (1) collaboration between clinical investigator(s) and clinical research coordinators to ensure accurate dose calculations based on weight; (2) verification that the recommended dose adheres to protocol guidelines; and (3) provision of detailed instructions and written documentation of dosing instructions to patients. You stated that these measures are designed to enhance the accuracy and integrity of medication dispensation processes, to minimize the risk of dosage errors, and to ensure the safety and well-being of study participants.

While we acknowledge the corrective and preventive actions that your site has taken, your response is inadequate because you did not include sufficient details about your corrective action plan. For example, you did not provide sufficient details regarding the procedures implemented at your site to prevent similar violations in the future. Given the significance of the protocol violation involving a pediatric subject, we request follow-up documentation regarding the procedure implemented at your site to ensure compliance with study protocol dosing procedures. Without this information, we are unable to determine if your corrective action plan is adequate to prevent similar violations in the future.

We emphasize that as the clinical investigator, it is your responsibility to ensure that studies are conducted in accordance with the investigational plan, to protect the rights, safety, and welfare of subjects and to ensure the integrity of study data. Your failure to conduct the clinical study in accordance with the protocol resulted in an overdose of (b)(4) to a pediatric subject. This conduct raises significant concerns about your protection of the study subjects enrolled at your site, and also raises concerns about the validity and integrity of the data collected at your site.

This letter is not intended to be an all-inclusive list of deficiencies with your clinical study of an investigational drug. It is your responsibility to ensure adherence to each requirement of the law and relevant FDA regulations. You should address any deficiencies and establish procedures to ensure that any ongoing or future studies comply with FDA regulations.

This letter notifies you of our findings and provides you with an opportunity to address the deficiencies noted above. Within 15 business days of your receipt of this letter, you should notify this office in writing of the actions you have taken to prevent similar violations in the future. Failure to address this matter adequately may lead to regulatory action. If you believe that you have complied with the FD&C Act and relevant regulations, please include your reasoning and any supporting information for our consideration.

Should you have any questions or concerns regarding this letter or the inspection, please email FDA at [email protected]. Your written response and any pertinent documentation should be addressed to:

Brittany L. Garr-Colón, M.P.H. Branch Chief (Acting) Compliance Enforcement Branch Division of Enforcement and Postmarketing Safety Office of Scientific Investigations Office of Compliance Center for Drug Evaluation and Research U.S. Food and Drug Administration Building 51, Room 5352 10903 New Hampshire Avenue Silver Spring, MD 20993

Sincerely yours, {See appended electronic signature page} David C. Burrow, Pharm.D., J.D. Director Office of Scientific Investigations Office of Compliance Center for Drug Evaluation and Research U.S. Food and Drug Administration -------------------------------------------------------------------------------------------- This is a representation of an electronic record that was signed electronically. Following this are manifestations of any and all electronic signatures for this electronic record. --------------------------------------------------------------------------------------------

Detecting sexism in social media: an empirical analysis of linguistic patterns and strategies

  • Published: 03 September 2024

Cite this article

empirical research food

  • Francisco Rodríguez-Sánchez 1 ,
  • Jorge Carrillo-de-Albornoz 1 , 2 &
  • Laura Plaza 1 , 2  

With the rise of social networks, there has been a marked increase in offensive content targeting women, ranging from overt acts of hatred to subtler, often overlooked forms of sexism. The EXIST (sEXism Identification in Social neTworks) competition, initiated in 2021, aimed to advance research in automatically identifying these forms of online sexism. However, the results revealed the multifaceted nature of sexism and emphasized the need for robust systems to detect and classify such content. In this study, we provide an extensive analysis of sexism, highlighting the characteristics and diverse manifestations of sexism across multiple languages on social networks. To achieve this objective, we conducted a detailed analysis of the EXIST dataset to evaluate its capacity to represent various types of sexism. Moreover, we analyzed the systems submitted to the EXIST competition to identify the most effective methodologies and resources for the automated detection of sexism. We employed statistical methods to discern textual patterns related to different categories of sexism, such as stereotyping, misogyny, and sexual violence. Additionally, we investigated linguistic variations in categories of sexism across different languages and platforms. Our results suggest that the EXIST dataset covers a broad spectrum of sexist expressions, from the explicit to the subtle. We observe significant differences in the portrayal of sexism across languages; English texts predominantly feature sexual connotations, whereas Spanish texts tend to reflect neosexism. Across both languages, objectification and misogyny prove to be the most challenging to detect, which is attributable to the varied vocabulary associated with these forms of sexism. Additionally, we demonstrate that models trained on platforms like Twitter can effectively identify sexist content on less-regulated platforms such as Gab. Building on these insights, we introduce a transformer-based system with data augmentation techniques that outperforms competition benchmarks. Our work contributes to the field by enhancing the understanding of online sexism and advancing the technological capabilities for its detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

empirical research food

Explore related subjects

  • Artificial Intelligence

Data Availability

Data will be made available on request.



Campuzano MV (2019) Force and inertia: a systematic review of women’s leadership in male-dominated organizational cultures in the United States. Hum Resour Dev Rev 18(4):437–469

Article   Google Scholar  

Mandel H, Semyonov M (2014) Gender pay gap and employment sector: sources of earnings disparities in the United States, 1970–2010. Demography 51(5):1597–1618. Accessed Apr 08 2022

Dosil M, Jaureguizar J, Bernaras E, Sbicigo J (2020) Teen dating violence, sexism, and resilience: a multivariate analysis. Int J Environ Res Public Health 17(8):2652

Rodríguez-Sánchez F, Carrillo-de-Albornoz J, Plaza L, Gonzalo J, Rosso P, Comet M, Donoso T (2021) Overview of exist 2021: sexism identification in social networks. Proces Leng Nat 67:195–207

Google Scholar  

Fersini E, Rosso P, Anzovino M (2018) Overview of the task on automatic misogyny identification at ibereval 2018. IberEval@ SEPLN 2150, 214–228

Pamungkas EW, Basile V, Patti V (2020) Misogyny detection in twitter: a multilingual and cross-domain study. Inf Process Manage 57(6):102360

Guest E, Vidgen B, Mittos A, Sastry N, Tyson G, Margetts H (2021) An expert annotated dataset for the detection of online misogyny. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 1336–1350

Rodríguez-Sánchez F, Carrillo-de-Albornoz J, Plaza L, Mendieta-Aragón A, Marco-Remón G, Makeienko M, Plaza M, Gonzalo J, Spina D, Rosso P (2022) Overview of exist 2022: sexism identification in social networks. Proces Leng Nat 69:229–240

Waseem Z (2016) Are you a racist or am I seeing things? annotator influence on hate speech detection on Twitter. In: Proceedings of the first workshop on NLP and computational social science. Association for Computational Linguistics, Austin, Texas, pp 138–142. https://doi.org/10.18653/v1/W16-5618 . https://aclanthology.org/W16-5618

Waseem Z, Hovy D (2016) Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop. Association for Computational Linguistics, San Diego, California pp 88–93. https://doi.org/10.18653/v1/N16-2013 . https://www.aclweb.org/anthology/N16-2013

Frenda S, Ghanem B, Montes-y-Gómez M, Rosso P (2019) Online hate speech against women: automatic identification of misogyny and sexism on twitter. J Intell Fuzzy Syst 36(5):4743–4752

Anzovino M, Fersini E, Rosso P (2018) Automatic identification and classification of misogynistic language on twitter. In: Natural language processing and information systems

Rodríguez-Sánchez F, Carrillo-de-Albornoz J, Plaza L (2020) Automatic classification of sexism in social networks: an empirical study on twitter data. IEEE Access 8:219563–219576. https://doi.org/10.1109/ACCESS.2020.3042604

Zeinert P, Inie N, Derczynski L (2021) Annotating online misogyny. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 1: Long Papers), pp 3181–3197

Basile V, Bosco C, Fersini E, Debora N, Patti V, Pardo FMR, Rosso P, Sanguinetti M et al (2019) Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: 13th International workshop on semantic evaluation. Association for Computational Linguistics, pp 54–63

Jiang A, Yang X, Liu Y, Zubiaga A (2021) SWSR: a chinese dataset and lexicon for online sexism detection. CoRR abs/2108.03070 2108.03070

Höfels DC, Çöltekin Ç, Mădroane ID (2022) Coroseof-an annotated corpus of romanian sexist and offensive tweets. In: Proceedings of the thirteenth language resources and evaluation conference, pp 2269–2281

Canós JS (2018) Misogyny identification through svm at ibereval 2018. In: IberEval@SEPLN

Nina-Alcocer V (2018) Ami at ibereval2018 automatic misogyny identification in spanish and english tweets. In: IberEval@SEPLN

Frenda S, Ghanem B, Montes M (2018) Exploration of misogyny in spanish and english tweets. In: IberEval@SEPLN

Paetzold GH, Zampieri M, Malmasi S (2019) UTFPR at SemEval-2019 task 5: hate speech identification with recurrent neural networks. In: Proceedings of the 13th international workshop on semantic evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, pp 519–523. https://doi.org/10.18653/v1/S19-2093 . https://aclanthology.org/S19-2093

West C (1995) Critical race theory: the key writings that formed the movement. The New Press

Rostami M, Oussalah M, Farrahi V (2022) A novel time-aware food recommender-system based on deep learning and graph clustering. IEEE Access 10:52508–52524

Rostami M, Muhammad U, Forouzandeh S, Berahmand K, Farrahi V, Oussalah M (2022) An effective explainable food recommendation using deep image clustering and community detection. Intell Syst Appl 16:200157

Bassignana E, Basile V, Patti V (2018) Hurtlex: a multilingual lexicon of words to hurt. In: 5th Italian conference on computational linguistics, CLiC-it 2018, vol 2253, pp 1–6. CEUR-WS

Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international AAAI conference on web and social media. ICWSM ’17, pp 512–515

Hartvigsen T, Gabriel S, Palangi H, Sap M, Ray D, Kamar E (2022) ToxiGen: a large-scale machine-generated dataset for adversarial and implicit hate speech detection. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the association for computational linguistics (vol 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, pp 3309–3326. https://doi.org/10.18653/v1/2022.acl-long.234 . https://aclanthology.org/2022.acl-long.234

Achananuparp P, Hu X, Shen X (2008) The evaluation of sentence similarity measures. In: International conference on data warehousing and knowledge discovery. Springer, pp 305–316

Zannettou S, Bradlyn B, De Cristofaro E, Kwak H, Sirivianos M, Stringini G, Blackburn J (2018) What is gab: a bastion of free speech or an alt-right echo chamber. In: Companion proceedings of the web conference 2018, pp 1007–1014

Wilson J (2016) ab: alt-right’s social media alternative attracts users banned from Twitter. The Guardian. https://www.theguardian.com/media/2016/nov/17/gab-alt-right-social-media-twitter

Montes MEA (2021) Proceedings of the iberian languages evaluation forum (iberlef 2021). In: CEUR Workshop proceedings

Paula A, Silva R, Schlicht I (2021) Sexism prediction in spanish and english tweets using monolingual and multilingual bert and ensemble models. Proces Leng Nat

Canete J, Chaperon G, Fuentes R, Pérez J (2020) Spanish pre-trained bert model and evaluation data. PML4DC at ICLR 2020

Martínez-Cámara E, Díaz-Galiano M, García-Cumbreras M, García-Vega M, Villena-Román J (2017) Overview of tass 2017. IberEval@ SEPLN 1896, 13–21

Mnassri K, Rajapaksha P, Farahbakhsh R, Crespi N (2022) BERT-based ensemble approaches for hate speech detection. https://doi.org/10.48550/ARXIV.2209.06505 . https://arxiv.org/abs/2209.06505

He P, Liu X, Gao J, Chen W (2020) Deberta: decoding-enhanced bert with disentangled attention. arXiv:2006.03654

Fandiño AG, Estapé JA, Pàmies M, Palao JL, Ocampo JS, Carrino CP, Oller CA, Penagos CR, Agirre AG, Villegas M (2022) Maria: Spanish language models. Proces. Leng. Nat 68. https://doi.org/10.26342/2022-68-3

Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692

Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y, (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference track proceedings. http://arxiv.org/abs/1412.6980

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch

Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J (2019) Huggingface’s transformers: State-of-the-art natural language processing. CoRR abs/1910.03771 1910.03771

Vaca-Serrano A (2022) Detecting and classifying sexism by ensembling transformers models. Proces Leng Nat 2:1

Nguyen DQ, Vu T, Nguyen AT (2020) Bertweet: a pre-trained language model for english tweets. arXiv:2005.10200

Rosa J, Ponferrada EG, Romero M, Villegas P, Prado Salas PG, Grandury M (2022) Bertin: efficient pre-training of a spanish language model using perplexity sampling. Proces Leng Nat 68:13–23

Pérez, J.M., Furman, D.A., Alemany, L.A., Luque, F.: Robertuito: a pre-trained language model for social media text in spanish. arXiv:2111.09453 (2021)

Plaza L, Carrillo-de-Albornoz J, Morante R, Amigó E, Gonzalo J, Spina D, Rosso P (2023) Overview of exist 2023–learning with disagreement for sexism identification and characterization (extended overview). Working Notes of CLEF

Download references


This work was supported by the Spanish Ministry of Science and Innovation under the project “FairTransNLP: Midiendo y Cuantificando el sesgo y la justicia en sistemas de PLN”(PID2021-124361OB-C32), funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe.

Author information

Authors and affiliations.

UNED NLP & IR Group, 16 Juan del Rosal, Madrid, 28040, Spain

Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz & Laura Plaza

RMIT University, 124 La Trobe St, Melbourne, VIC, 3000, Australia

Jorge Carrillo-de-Albornoz & Laura Plaza

You can also search for this author in PubMed   Google Scholar


All authors contributed equally to this work.

Corresponding authors

Correspondence to Francisco Rodríguez-Sánchez , Jorge Carrillo-de-Albornoz or Laura Plaza .

Ethics declarations

Competing interests.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and Informed Consent for Data Used

The data utilized in this study was developed by the authors specifically for research purposes within the context of the EXIST competition [ 4 ]. Given its original creation and management by the authors, there are no concerns related to external data collection or participant consent. All necessary ethical considerations, including ensuring the anonymity and confidentiality of all participants or contributors, were strictly adhered to during data collection and processing.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rodríguez-Sánchez, F., Carrillo-de-Albornoz, J. & Plaza, L. Detecting sexism in social media: an empirical analysis of linguistic patterns and strategies. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05795-2

Download citation

Accepted : 16 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1007/s10489-024-05795-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Online sexism
  • Sexism categorization
  • Social media
  • Sexism dataset
