Measuring environmental concern through international surveys: A study of cross-cultural equivalence with item response theory and confirmatory factor analysis
Introduction
In response to the increasing threat of environmental problems and resultant global efforts to raise awareness and attempt to reduce their negative impact, it is necessary to understand the factors that could increase people's pro-environmental dispositions and behaviors to promote environmental protection. One such factor is environmental concern (EC) that describes “the degree to which people are aware of problems regarding the environment and support efforts to solve them” (Dunlap & Jones, 2002, p. 485), which has been widely studied to understand the factors associated with it (e.g., personal and cultural values; Dietz, Fitzgerald, & Shwom, 2005; Dunlap & York, 2008), and how it relates to pro-environmental behavior (e.g., Eom, Kim, Sherman, & Ishii, 2016; Tam & Chan, 2017).
As environmental problems have expanded to the global sphere (Dunlap, Van Liere, Mertig, & Jones, 2000), EC has become a relevant subject of study in cross-cultural research over the last decades. This scholarship has generally relied on the data of large-scale international surveys (Olofsson & Öhman, 2006) such as the World Values Survey and the International Social Survey Programme (ISSP). Moreover, while the questionnaires of these surveys have not been developed to measure EC in particular, researchers have adopted items from them to operationalize it, and subsequently arrived at relevant findings on cross-cultural differences. Nevertheless, due to the various threats to validity in cross-cultural research (see van de Vijver & Hambleton, 1996), developing optimal measures of EC in this context must be a top priority, otherwise the employment of low-quality measures derives in biased results that can lead to misleading conclusions regarding cross-cultural differences. Consequently, researchers interested in investigating EC cross-culturally need to be aware of the particularities of this type of research, and be prepared to handle them appropriately to “[maximize] the validity of inferences” (van de Vijner & Leung, 2010, p. 17).
In spite of that, research on the adequacy of measurement instruments of EC for cross-cultural research is still scarce, especially concerning those designed from international surveys. Only until recently these measures began to be put through tests of measurement equivalence (e.g., Marquart-Pyatt, 2015; Mayerl, 2016), but these studies often employ homogeneous methodologies of psychometric evaluation and rarely discuss the problems of these measures or what actions could improve them. For that reason, in our study we adopted a different methodological approach to make an exploration of a previously used operationalization of EC derived from the ISSP Environment modules. Specifically, we employed item response theory (IRT) and multigroup confirmatory factor analysis (MCFA) to assess the quality and equivalence of the target measures, and later examined the possible sources of bias affecting them. In addition, our study was centered on a concrete comparison between the UK and Taiwan to provide a comprehensive analysis that we hope motivates researchers to perform more thoughtful evaluations of the measures they employ in cross-cultural research.
Achieving equivalence by diminishing conceptual and methodological biases should be a priority when conducting cross-cultural research. In this context, equivalence refers to the degree of comparability of measurement outcomes between cultural groups, while bias encompasses all factors that threaten the validity of cross-cultural comparisons (van de Vijver & Hambleton, 1996; van de Vijner & Leung, 2010). It is normally assumed that bias is the main cause of measurement nonequivalence. When this occurs, between-group differences in average scores or in the correlation of the instrument with external factors are likely to be artificial and potentially deceiving (Reise, Widaman, & Pugh, 1993, p. 552). Cross-cultural research is particularly vulnerable to measurement nonequivalence because there is less control over confounding factors that could be more easily accounted for in research within a single culture (van de Vijner & Leung, 2010). This is even more pressing when cultures are distant in aspects like language and history, since confounding factors may increase with cultural distance. For example, the conceptualization of EC may be more similar between societies that have been historically more interconnected than otherwise (e.g., the UK and France vs. the UK and Vietnam) because of the greater chance they have associated similar behaviors and cognitions to this construct.
On this basis, cross-cultural researchers need to address various forms of equivalence and bias in order to back their findings. Regarding equivalence, four major types or nested levels are commonly discussed: construct, structural, metric, and scalar equivalence. Construct equivalence concerns the extent to which a construct exists and has the same meaning across the cultures studied; structural equivalence means that the construct's dimensionality is identical across cultures and the same indicators can be used to measure it; metric equivalence describes the extent that the intervals of the measurement scale have the same meaning within each culture; and scalar equivalence implies that the cultural groups share a common scale (Fischer & Fontaine, 2010; Sireci, 2010; van de Vijner & Leung, 2010). These types of equivalence are usually evaluated hierarchically from lower to higher levels, and it is considered that metric equivalence is the minimum requirement for cross-group comparisons. In specific, indirect comparisons in the shape of correlates and causal effects are possible (Mayerl & Best, 2019), whereas direct comparisons of means are only justified when scalar equivalence is demonstrated (van de Vijner & Leung, 2010).
Similar to equivalence, bias can be categorized in different types such as construct, method, and item bias (van de Vijver & Hambleton, 1996). Construct bias is a direct threat to construct equivalence, and it occurs when a construct conceptualization is inconsistent across groups (Sireci, 2010), for example, when a behavior is associated with the construct in one culture but not in another. Method bias encompasses the threats to validity involving all aspects related to sampling and study administration (van de Vijner & Leung, 2010). For instance, that a measure is administered as an interview in one culture but as a self-completion instrument in another. Lastly, item bias is determined if there is a systematic difference in the response to an item when groups are matched by trait level (i.e., differential item functioning), and this is caused by a factor unrelated to the measured construct (e.g., poor item translation; Sireci, 2010).
Many methods exist for dealing with equivalence and bias (see International Test Commission, 2017; van de Vijner & Leung, 2010). This article focuses on a posteriori approaches, that is, methods of data analysis after study completion. One such methods is MCFA, which has been the main procedure used to validate survey-based measures of EC. MCFA is specifically devised for testing measurement equivalence at the scale level. It evaluates in a stepwise fashion the fit of increasingly constrained latent variable models to the data of two or more groups, where each model is directly linked to a specific type of equivalence (Milfont & Fischer, 2010). In the standard procedure, MCFA is used to assess from structural to scalar equivalence. The model that tests structural equivalence is normally defined as the baseline, and higher levels of equivalence (metric and then scalar equivalence) are tested by placing parameter constraints on this model. Researchers can determine that the measure shows a certain level of equivalence by demonstrating that the model in question fits the data well and additionally, in the case of metric and scalar equivalence, that its fit is not considerably worse than that of the preceding less constrained model (Davidov, Meuleman, Cieciuch, Schmidt, & Billiet, 2014; Fischer & Fontaine, 2010). When a certain type of equivalence is rejected, a partial form of it can be tested (i.e., partial equivalence), namely, a condition in which some but not all parameters of interest for the model are equivalent across groups. It has been argued that partial metric or scalar equivalence suffice the requirement for making cross-group comparisons validly (Reise et al., 1993; Vandenberg & Lance, 2000), but there is evidence against this assertion (Davidov et al., 2014; Steinmetz, 2013). Besides, partial equivalence is often evaluated in an explorative and post hoc manner prone to capitalization on chance, and currently there is no consensus on the guidelines to assess or handle it (Davidov et al., 2014; Vandenberg & Lance, 2000), which is why it is not recommended when the main goal is to make a psychometric evaluation (Brown, 2006, p. 301).
Some of the disadvantages of MCFA include that it can lead to biased findings when underlying assumptions are violated (e.g., multivariate normality), and it may not accurately identify nonequivalence under certain conditions (e.g., extreme response bias, Kankaraš, Vermunt, & Moors, 2011). Likewise, modeling options and graphical tools are limited in the MCFA framework (Raju, Laffitte, & Byrne, 2002; Tay, Meade, & Cao, 2015). In this article, IRT methods are presented as a complement to MCFA in the study of equivalence that can also compensate for some of its limitations. This will be discussed in more detail below.
Researchers have frequently used the resources of international surveys to study EC cross-culturally. These investigations have employed single or combined data of multiple survey versions of the same or different research programs, of which the ISSP and World Values Survey are the most cited. In this field, EC has been conceptualized as a single- or multi-dimensional latent variable and operationalized in diverse ways, including willingness to make monetary sacrifices (Hao, 2016; Nawrotzki and Pampel, 2013), threat perception of environmental problems (Knight & Messer, 2012; Pisano & Lubell, 2017), and general environmental views (Marquart-Pyatt, 2012; Tam & Chan, 2017). EC has been investigated in relation to country-level factors in which national wealth (e.g., Dunlap & York, 2008; Franzen & Meyer, 2010), post-materialism (e.g., Mayerl & Best, 2018; Pampel, 2014), cultural values (e.g., Eom et al., 2016; Tam & Chan, 2017, 2018), and environmental conditions (e.g., Knight & Messer, 2012) stand out. Many of these studies have also investigated the relation of EC to individual-level factors such as demographic variables (e.g., Haller & Hadler, 2008; Nawrotzki & Pampel, 2013), individualist or collectivist orientations (e.g., Olofsson & Öhman, 2006), and post-materialist values (e.g., Gelissen, 2007).
Since these studies are mostly correlational, the basic requirement for valid inferences is metric equivalence. Some of these studies also report group means (e.g., Franzen & Vogl, 2013; Nawrotzki and Pampel, 2013), which would require that scalar equivalence is demonstrated or, alternatively, that authors acknowledge the inaccuracy of mean comparisons if there is not scalar equivalence. However, measurement equivalence is seldom discussed in these studies and only few of them have tested it directly.2 Among these (see Table 1), identical methods have been used to assess equivalence (i.e., CFA and MCFA) and showed that the measures generally show metric but not scalar equivalence. A shortcoming of these studies is that they do not normally discuss to what extent the measures are adequate representations of EC across cultures, or explore substantially possible biasing factors influencing the quality and equivalence of these measures, which are some of the aspects explored in this article.
IRT is an approach for evaluating and developing psychometric measures that encompasses a diverse family of mathematical models and tools. It has been extensively developed in the context of achievement tests, but over the years has been applied to measures of attitudes, values and individual differences (e.g., attachment styles, Fraley, Waller, & Brennan, 2000; trait mindfulness, Pelham et al., 2019; environmental attitude, Zhu & Lu, 2017). The distinctive feature of IRT, unlike CFA, is the strong emphasis on comprehensive analysis at the item level. IRT models link item characteristics (i.e., item parameters) and the latent trait (i.e., the unobserved characteristic measured by the instrument) with the probabilities of selecting a certain response to an item (Brown, 2006; Zhu & Lu, 2017). Moreover, due to the possibility to estimate item- and person-parameters (i.e., latent trait) separately (Muñiz & Hambleton, 1992), IRT allows a nuanced analysis of how response probabilities and measurement precision vary by trait level.
Given the strong emphasis on item-level analysis, IRT methods are commonly used to gauge differential item functioning (DIF). Accordingly, various statistical procedures have been devised to assess DIF in the context of IRT (see Tay et al., 2015), which in conjunction with qualitative analyses enable the evaluation of item bias. Nevertheless, IRT provides several advantages to evaluating other aspects of measurement equivalence. As an example, IRT analyses generate several graphical tools for visualizing a population's response behavior to the items (e.g., item response functions), which can be used to identify biasing factors such as response styles (e.g., central tendency and extreme responding; Tay et al., 2015). Also, IRT models can estimate coefficients akin to Cronbach's alpha for every trait level (Fraley et al., 2000), which may indicate how representative are the items to measure the underlying construct across cultural groups (i.e., construct equivalence).
IRT methods can complement the work done on the cross-cultural validation of survey-based measures of EC for the following reasons. First, as it is the most common in CFA studies (Brown, 2006; Jackson, Gillaspy, & Purc-Stephenson, 2009), previous investigations normally employed a standard type of MCFA that assumes continuous data and uses the maximum likelihood estimator for parameter estimation (i.e., Marquart-Pyatt, 2015; Mayerl, 2016; Mayerl & Best, 2019).3 Technically speaking this is inadequate because the measures of EC use ordinal scales (e.g., 5-point Likert), and it has been shown that treating categorical as continuous data in CFA can lead to biased parameter estimates and increased rates of model rejection, especially if the data are non-normal (Brown, 2006; Kline, 2010). In contrast, this does not affect IRT models as they have been designed to handle nominal or ordinal data (Tay et al., 2015). For this same reason, IRT can estimate item parameters that the standard MCFA cannot (e.g., threshold parameters), which may be useful for identifying DIF with more accuracy (Kankaraš et al., 2011). Previous studies also did not investigate the sources of bias that lead to measurement nonequivalence, perhaps, because the information provided by MCFA is limited in this aspect. In a different manner, IRT offers diverse modeling options and tools that may not only facilitate the identification of biasing factors but also the conception of specific actions to enhance the measurement instruments.
This study refers to frameworks in the literature on cross-cultural equivalence (see van de Vijner & Leung, 2010) to make a comprehensive evaluation of a survey-based operationalization of EC. In particular, the operationalization developed by Marquart-Pyatt (2012) from the ISSP Environment Modules was investigated, which comprises two composite measures of two dimensions of EC: the pro-environmental views scale (PEVS) and the environmental awareness scale (EAS).4 Unlike earlier studies, IRT methods and MCFA were used collectively to study the measures’ cross-cultural equivalence, and a comparative assessment was developed by contrasting Taiwan and the UK. A methodological principle of cross-cultural research underpinned the decision to compare these two cultures- if X is theorized to shape differences in Y, then two cultures that vary in X are studied. That is, Taiwan and the UK were chosen because they differ in relevant factors that have been associated with differences in EC. Specifically, the UK (GDP: 40,390 USD) is a more affluent society than Taiwan (GDP: 25,530 USD; International Monetary Fund, 2020), has a greater proportion of post-materialists (Inglehart, 2008), and is predominantly individualist while Taiwan is collectivist (Hofstede, 2001). In consequence, if measurement nonequivalence were found, this would challenge the conclusions of studies that have used the measures in question and compared these cultures. Additionally, Mayerl (2016) found that his measure of EC had poor fit indices and the lowest convergent validity in Taiwan. Therefore, given that the target measures have some overlap with that of Mayerl (2016), the reasons behind the observed poor measurement performance in Taiwan could be investigated.
Section snippets
Data and sample
The data of the ISSP 2010: Environment III (ISSP Research Group, 2012) were employed to perform this investigation. The ISSP is a nationwide, cross-sectional survey project conducted annually around the world since 1984 to investigate focus topics of the social sciences. Environment III is the latest module exclusively focused on environmental issues, which was conducted between 2009 and 2013 across 36 societies.
Dimensionality and evaluation of measurement equivalence with MCFA
To evaluate structural equivalence and the assumption of unidimensionality, the baseline model for each measure was specified by regressing all indicators onto one factor. Further modifications to this model were introduced on the basis of evidence of ill fit, which included unacceptable fit indices, large residuals and substantial modification indices (Brown, 2006). These modifications were theory-driven or basing on previous research. CFAs were first run in the reference group (the UK) and
Discussion
The cross-cultural equivalence of two distinct measures of EC derived from the ISSP Environment modules was explored in this paper. Unlike previous studies, we made a concrete comparison between Taiwan and the UK and intended to utilize CFA and IRT approaches jointly to develop a detailed evaluation of the target measures. Our findings demonstrated that both the PEVS and EAS exhibit measurement nonequivalence, but the extent of nonequivalence differs between the two. And while they both perform
Funding acknowledgement
This work was supported by the National Social Science Foundation of China under Grant No. 18BSH122 and the Fundamental Research Funds for the Central Universities in China under Grant No.010914370122. The founders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
CRediT authorship contribution statement
Julián D. Rodríguez-Casallas: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft, Writing - review & editing, Visualization. Wei Luo: Methodology, Software, Formal analysis. Liuna Geng: Conceptualization, Writing - review & editing, Supervision, Project administration, Funding acquisition.
Declaration of competing interest
None.
References (85)
- et al.
Two decades of measuring environmental attitudes: A comparative analysis of 33 countries
Global Environmental Change
(2013) Contextual influences on environmental concerns cross-nationally: A multilevel investigation
Social Science Research
(2012)- et al.
Understanding item parameters in personality scales: An explanatory item response modeling approach
Personality and Individual Differences
(2018) - et al.
A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales
Journal of Clinical Epidemiology
(2009) - et al.
Environmental concern has a weaker association with pro-environmental behavior in some societies than others: A cross-cultural psychology perspective
Journal of Environmental Psychology
(2017) - et al.
Generalized trust narrows the gap between environmental concern and pro-environmental behavior: Multilevel evidence
Global Environmental Change-Human and Policy Dimension
(2018) - et al.
Re-evaluation of the New Ecological Paradigm scale using item response theory
Journal of Environmental Psychology
(2017) - et al.
Exploring the robustness of a unidimensional item response theory model with empirically multidimensional data
Applied Measurement in Education
(2017) The basics of item response theory
(2001)- et al.
Modeling acquiescence in measurement models for two balanced sets of items
Structural Equation Modeling
(2000)
The cultural fairness of the 12-item General Health Questionnaire among diverse adolescents
Psychological Assessment
Confirmatory factor analysis for applied research
Mirt: A multidimensional item response theory package for the R environment
Journal of Statistical Software
Cross-national variation of gender differences in environmental concern: Testing the sociocultural hindrance hypothesis
Environment and Behavior
Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations
Journal of Statistical Software
Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar
Medical Care
A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression
Quality of Life Research
Measurement equivalence in cross-national research
Annual Review of Sociology
Environmental values
Annual Review of Environment and Resources
Collaborative and iterative translation: An alternative approach to back translation
Journal of International Marketing
Application of unidimensional item response theory models to multidimensional data
Applied Psychological Measurement
Environmental concern: Conceptual and measurement issues
New trends in measuring environmental attitudes: Measuring endorsement of the new ecological paradigm: A revised NEP scale
Journal of Social Issues
The globalization of environmental concern and the limits of the postmaterialist values explanation: Evidence from four multinational surveys
The Sociological Quarterly
Cultural variability in the link between environmental concern and support for environmental action
Psychological Science
Standardization to account for cross-cultural response bias: A classification of score adjustment procedures and review of research in JCCP
Journal of Cross-Cultural Psychology
Methods for investigating structural equivalence
An item response theory analysis of self-report measures of adult attachment
Journal of Personality and Social Psychology
Environmental attitudes in cross-national perspective: A multilevel analysis of the ISSP 1993 and 2000
European Sociological Review
Explaining popular support for environmental protection: A multilevel analysis of 50 nations
Environment and Behavior
Multiple group IRT measurement invariance analysis of the forms of Self-Criticising/Attacking and Self-Reassuring Scale in thirteen international samples
Journal of Rational-Emotive and Cognitive-Behavior Therapy
Dispositions to act in favor of the environment: Fatalism and readiness to make sacrifices in a cross-national perspective
Sociological Forum
Good practices for identifying differential item functioning
Medical Care
A panel regression study on multiple predictors of environmental concern for 82 countries across seven years
Social Science Quarterly
Culture's consequences: Comparing values, behaviors, institutions and organizations across nations
Changing values among western publics from 1970 to 2006
West European Politics
GDP per capita, current prices
The ITC guidelines for translating and adapting tests
International social survey Programme: Environment III. GESIS data archive
Reporting practices in confirmatory factor analysis: An overview and some recommendations
Psychological Methods
Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection
Applied Measurement in Education
Survey response styles across cultures
Cited by (7)
Scaling actors’ perspectives about innovation system functions: Diffusion of biogas in Brazil
2023, Technological Forecasting and Social ChangeConsumers’ values, attitudes and behaviours towards consuming seaweed food products: The effects of perceived naturalness, uniqueness, and behavioural control
2023, Food Research InternationalCitation Excerpt :Perceived behavioural control leads to a higher predictive power of attitude with regard to seaweed consumption. This result is in line with previous studies (La Barbera & Ajzen, 2021; Redondo & Puelles, 2017) and confirms the contribution of perceived behavioural control to reducing the gap between attitude and pro-environmental behaviour. The results highlight the importance of biospheric values in the formation of attitude.
Consistency and variation in the associations between Refugee and environmental attitudes in European mass publics
2021, Journal of Environmental PsychologyCitation Excerpt :Single items will of course have narrower scope than broader multi-item scales. Problems related to non-equivalence in the measurement of environmental attitudes have recently been noted also in other contexts (Rodríguez-Casallas, Luo, & Geng, 2020). Developing metrically compatible attitude scales should be a high priority for future cross-cultural studies.
An item response approach to sea-level rise policy preferences in a nascent subsystem
2023, Review of Policy ResearchTax Avoidance Culture and Employees' Behavior Affect Sustainable Business Performance: The Moderating Role of Corporate Social Responsibility
2022, Frontiers in Environmental Science
- 1
Present address: Students Affairs Department, Hunan University, No. 2 Lushan South Road, Yuelu District, Changsha, Hunan, 410082, China.