Introduction

The world is in urgent need of competent professionals to contribute to societal transformations towards sustainability (Gordon et al. 2019), and educational institutions ought to prepare students for these roles (Barth 2016; Franco et al. 2019). In response to this challenge, there has been a proliferation of sustainability science programs (O’Byrne et al. 2015), which increasingly define the learning objectives for their students in terms of sustainability competencies (Salovaara et al. 2020). Competencies are “complex combination[s] of knowledge, skills, understanding, values, attitudes and desire which lead to effective, embodied human action in the world” (Crick 2008). There is increasing agreement on the set of key competencies in sustainability (Brundiers et al. 2020), namely, systems-thinking, futures-thinking, values-thinking, strategic-thinking, and interpersonal competencies (Wiek et al. 2011)). Similarly, scholars and educators have started to converge on effective and efficient pedagogies to develop these competencies (Brundiers et al. 2010; Frisk and Larson 2011; Barth and Michelsen 2013).

Yet, the practice of assessing students’ sustainability competencies is still in its infancy (Waltner et al. 2019). A broad range of assessment tools are currently in use for both research and instructional purposes (Cebrián Bernat et al. 2019). However, these tools are rarely selected with clear and informed intention, largely due to a lack of guidance in the literature (Besong and Holland 2015). Despite a growing body of research describing innovative pedagogies (Hallinger and Chatpinyakoop 2019), there is a shortage of empirical evidence of whether and in what ways these pedagogies are successful in developing students’ sustainability competencies (Osagie et al. 2016; Mindt and Rieckmann 2017; Garrecht et al. 2018). Meanwhile, course instructors, curriculum designers, and program directors lack the means to effectively assess whether or not they are successfully educating sustainability professionals through their courses and programs, which is a core purpose of assessment (Kuh et al. 2014). This is a significant gap when it comes to constructive alignment (Biggs 1996) and putting all critical components of sustainability (science) education in place (Fig. 1). As this figure illustrates, reliable and valid tools for assessing competencies, which is the focus of this article, fulfill an important function in supporting structured teaching efforts and student learning for sustainability.

Fig. 1
figure 1

Framework which indicates the crucial role assessment plays in supporting student learning

Education science researchers have called out traditional methods of assessment as inadequate for measuring multi-dimensional and performance-oriented competencies (Frey and Hartig 2009). Traditional assessments are already challenging for experts to create and apply properly (Reckase 2017) and adequate assessment of competencies even more so (Leutner et al. 2017). Nonetheless, much exploratory work on assessing competencies has begun (Hartig et al. 2007), though a review found that progress on competency assessment was limited, particularly in the non-cognitive dimensions (Zlatkin-Troitschanskaia et al. 2015). For sustainability competencies in particular, Barth (2009) provided a conceptual framing, and sporadic if increasing efforts to develop tools has been undertaken by individual instructors and researchers around the world (Cebrián Bernat et al. 2019). This growing body of research has yet to be brought together in a systematic review which compares the existing tools and provides guidance to instructors, researchers, and program directors.

This review article examines what tools are currently used for assessing students’ sustainability competencies, as documented in the literature through the end of 2019. We conducted an in-depth analysis of a comprehensive sample of peer-reviewed publication (N = 75) and distilled a typology of assessment tools for sustainability competencies. We also evaluate strengths and weaknesses of these tools and offer avenues for improvements. The article provides guidance to instructors, researchers, and program directors who are interested in using competencies assessment tools in more informed ways.

Research design

To review literature on assessing students’ sustainability competencies thus far, we systematically collected publications from SCOPUS, Web of Science, ERIC, and Google Scholar, published in English through 2019 resulting in a first pool of 3908 publications. Following Moher et al.’s (2009) and Fink’s (2014) systematic review approaches, we then iteratively excluded publications by first reviewing the titles, then abstracts and finally the full text. This yielded 75 publications focused on sustainability competencies assessments (see appendix for a full description of procedures). For this sample, Fig. 2 shows the steady growth of publications on sustainability competencies assessments over the last 10 years. However, they still only represents less than 7% of the sustainability (science) education research field as reviewed in 2017 (Grosseck et al. 2019). The publications come from 35 outlets, yet, research took place almost exclusively in OECD countries (93%) and at higher education institutions (87%). Sustainability/environmental degree programs, teacher training, general education, and business/management education were the most frequent foci areas of the studies. Research on assessment in sustainability (science) education appears to likely be in its emergent growth phase, trailing the pattern of research growth in sustainability science by about 15 years (Fang et al. 2018).

Fig. 2
figure 2

Publications on sustainability competencies assessments per year in final sample (solid line is rolling 3-year average)

In reviewing the sampled literature, we identified 121 total tools in use (many of the 75 reviewed studies used more than one tool), which we classified into eight distinct types of tools currently being used to assess students’ sustainability competencies. To be clustered into a type, a tool has to have a record of several applications (with documentation). We disregarded terminological differences in cases, where authors used different names for the same tool. We first generalized the descriptions to cover all specific tools under each type and then standardized the descriptions to make the tools comparable (Table 1). We then analyzed each tool (type) independently and in contrast to each other using a set of common attributes (Table 2). We finally appraised strengths and weaknesses of each tool (type), as well as explored potential improvements (Table 3). This appraisal was informed by insights on competencies assessments gleaned from the broader educational literature.

Table 1 Currently used tools for assessing students’ sustainability competencies (with frequency)
Table 2 Examples of each assessment tool with description and application
Table 3 Appraisal of the assessment tools organized by cluster

Typology of tools for competencies assessment

Instructors use a wide variety of tools for assessing students’ sustainability competencies (121 in total were identified from this sample). They can nonetheless be clustered into eight major tools (types) (Table 1), currently in use. Some of these types are quite broad (e.g., reflective writing), while others are narrower, but also more refined (e.g., concept mapping). Many studies used more than one tool (n = 31) with scaled self-assessment being disproportionately represented among these (80%) when compared to the overall sample (56%). Generally, there were only few cases, where a single tool was developed over multiple publications. The exception to that was the scenario/case test type, where four tools were iteratively developed over 14 publications.

We first present examples of each tool (Table 2). These examples were chosen based on three criteria: (1) representativeness of tool, (2) clarity of description in publication (a frequent deficiency), and (3) if they used the competency framework articulated by Wiek et al. (2011). We chose to purposefully select examples which use the same key competencies, so that comparability between tools is enhanced. In our sample, the Wiek et al. (2011) framework was the only one used across enough studies to make this possible, besides it being highly influential on the broader field of sustainability (science) education as noted in other reviews (Grosseck et al. 2019). However, it is not possible to conduct a comprehensive meta-analysis of assessment results due to the diversity of what is being assessed, i.e., the specific sustainability competencies targeted.

The examples are drawn from a single source for each tool. They are described by two sets of characteristics: one for the tool itself and one for its application. The table can be read horizontally to give an overview of each example or vertically to enable comparison between tools for each characteristic. The different tools were each fairly widely applied (as represented by the captured characteristics). The scope of applications described in Table 2 well represents those within the overall sample. For each tool, there was also quite a variety of application settings.

Having identified eight distinct assessment tools (types), each of the studies (full list in the “Appendix”) was reviewed again, particularly with respect to the research methods used, and an analysis for each tool conducted. The first result of this analysis was that the eight tools can be further clustered into three meta-types: self-perceiving-based assessment procedures, observation-based assessment procedures, and test-based assessment procedures (see Table 3). The critical characteristic of the tool which determines the cluster is who is doing the assessment of the students’ competencies. For self-perceiving-based procedures (e.g., reflective writing), the student himself/herself is assessing his/her own competence level and/or development. In applying observation-based procedures, instructors or experts assess students’ competencies. The test-based assessment procedures use a predefined set of criteria (or “correct” answers) to evaluate students’ competencies. This distinction in who assesses students’ competencies leads to the tools within each cluster sharing much in common in terms of strengths and weaknesses.

Based on the analysis of the sample articles and review of broader education science literature, we compiled a distilled set of strengths, weaknesses, and best practices for each tool (Table 3). An exemplary citation was provided for each point whenever possible, typically representing many other sources. The column on current practice in Table 3 offers a generic description of the tool based on the full scope of examples, in contrast to the detailed, but specific examples offered in Table 2.

Discussion

We conducted a systematic review of the growing body of published research on the assessment of sustainability competencies. This review identified a wide range of assessment tools currently in use (more than 120 specific tools). Yet, despite this diversity on the surface, we argue for a typology containing eight major tool types that can be further grouped into three clusters of assessment procedures (Table 3). The tool types we specify overlap meaningfully with those utilized by Nicolaou and Constantinou (2014) in their systematic review of assessing a competence closely related to sustainability (modeling in science). In-depth insights into the tools comes via the examples included in Table 2 and through the appraisal summarized in Table 3.

There are clear signs of substantial investment in model and tool building (Waltner et al. 2019), multi-methodological triangulations (Kricsfalusy et al. 2018), and the piloting of innovative assessment tools (see box 1, below). However, this appraisal also reveals flaws in the current assessment practice in sustainability (science) education: there is too little connectivity across studies, in particular regarding agreement on outcomes; an over-reliance on scaled self-assessment; and general insufficiency of actual tool development. The implications of these flaws can be seen in Fig. 1—unclear learning objectives (1) or the lack of a baseline assessment (2) undermine the effectiveness of even well-developed assessment tools.

figure a

Other than the studies, where the same research group builds off of their previous work (scenario/case test type), there are no obvious connections (e.g., citations) made across research efforts. Even in the cases, where the same competencies are assessed (e.g., Wiek et al. 2011) and the same assessment tool is applied (e.g., scaled self-assessment), new studies are not building off the tool previously used (e.g., Molderez and Fonseca 2018). The reviewed competency-like constructs that are currently used in assessments are often so differently described that a comparison across assessments is impossible. Besides drawing on Wiek et al. (2011), a handful of studies explicitly proposed “new” competencies such as sustainability and social responsibility (SSR) (Albareda Tiana and Alférez Villarreal 2016); others leave it quite unclear what competencies were actually being assessed (e.g., Azeiteiro et al. 2015). Apart from making comparisons across assessments impossible, this ambiguity of learning outcomes undermines recognition and career trajectories of graduates from sustainability (science) programs.

Scaled self-assessment was by far the most commonly chosen assessment tool (56% of cases); yet, only rarely (Migliorini and Lieblein 2016) has the tool choice been justified. In their descriptive review, Bernat et al. (2019) hypothesize that this type of tool is often selected, because “it is less time-consuming, easy to distribute amongst a larger number of students, and in turn it provides a larger amount of information.” Several authors make the case for its pedagogical uses in sustainability science (Galt et al. 2013), in line with educational scholars who have advocated for self-reflection as a tool for formative assessment (Andrade 2019). However, as a tool of robust, reliable, and valid measurement of sustainability competencies, self-assessment falls much too short to warrant such popularity. As Metzler and Kurz (2018) conclude in their report on educational assessment procedure, “data gleaned from easy measurement tell us little about the student learning that matters most.”

Even among the assessment studies carefully selected for inclusion in this review, there is a tendency for development of assessment tools to be an apparent afterthought. The main topics of the studies are the pedagogical approach, case description, or programmatic innovation. Assessment as such is used to produce some empirical evidence to validate those initiatives’ success. Little effort goes into tool development ahead of time or reflection afterwards. But there are many studies from the educational sciences (Barth and Michelsen 2013) that have rigorously developed assessment tools, which the practice of sustainability competencies assessment should adopt going forward. Some, such as the recent work of Mehren et al. (2018) are highly relevant (assessing systems thinking in geography), yet are not being learned from in sustainability science. We recommend four steps. First, developing a clear set of learning objectives/outcomes to be assessed, properly operationalized for the given context; second, providing a theoretical and empirical basis for selecting a particular assessment tool to be used; third, articulating a psychometric model which links the learning outcomes to the tool to be used; fourth, pilot testing the tool with a relevant sample population.

Many disciplines have adopted some form of sustainability (science) education and instructors ought to look for assessment tools to fit their specific teaching situation. The experiences so far suggest that combining assessment tools may be the best way to address the shortcomings of any particular assessment tool. For example, assessment tools with reasonable validity due to narrow learning objectives, e.g., (Bögeholz et al. 2014), will likely have low reliability across contexts and content (Schuwirth and Van Der Vleuten 2011). Each assessment tool has inherent weaknesses even with proper development (which the typology helps to foresee); thus, triangulation should happen on two levels—within the clusters and between them. For example, combining scaled self-assessment with reflective writing (within a cluster) provides a more complete and meaningful picture of the students’ views of their own competencies, while triangulating these results with a testing approach (between clusters) checks the validity of students’ self-perception against an objective (if typically narrower) measure.

As mentioned above, individual cases of developing assessment tools seem quite promising. Beyond just the increase in the quantity of publications, some tools have been developed with rigor, along the lines of the four steps outlined above (e.g., Waltner et al. 2019). Additionally, it is critical to plan for ultimate deployment on a scale sufficient to the needs of sustainability (science) education (Arima 2009), a topic that Holdsworth et al. (2019b) have explicitly grappled with over a series of articles. Yet, for all the innovation that sustainability (science) education purports to offer pedagogically, the field has so far little to offer in terms of assessment. Inspiration could be drawn from many other educational fields (Leutner et al. 2017), in particular from medical education, with its innovative approaches to competency assessment (Lockyer et al. 2017). This is in line with other intriguing parallels between medical and sustainability (science) education. The recent in-vivo assessment described in box 1 drew its inspiration from the long and established practice of competencies assessment in medical education. Sustainability (science) education researchers and practitioners would do well to find inspiration in such corners.

Conclusions

This article offers a typology which provides guidance for instructors, researchers, and program directors interested in assessing students’ competencies in sustainability. This typology, based on a systematic review and synthesis of the academic literature through the end of 2019, goes beyond description to offer an appraisal of eight types of assessment tools. The analysis of their strengths, weaknesses, and best practices distills the key lessons from the 75 peer-reviewed publications included.

Reflective of the rest of the field of sustainability (science) education, there is a lack of explicit agreement on what is being assessed. This makes comparison of results impossible but also challenges comparisons of the process of assessment (i.e., the tools themselves). Perhaps due to assessment not being the topic of primary research interest, the assessment tools are not typically well-developed and often inappropriately used. This is particularly true of scaled self-assessment, for which weaknesses are well documented, yet, continues to dominate current assessment practice. In response to the lack of robust assessment tools, many instructors, researchers, and program directors have chosen to apply more than one, an approach which is likely to have value even if utilizing tools with extensive development. The proposed typology provides a structure of the field as it is today. As more tools are being developed and refined, we would expect to distinguish more specific tools such as Concept Mapping (specific to systems-thinking competence) within each of the broader categories. Ultimately, it would be the meta-types (e.g., self-perceiving) which would form the critical organizing structure. Despite a bumpy beginning, current trends are quite positive, as more rigor is being applied in combination with meaningful innovations.

Considering the need for broad sustainability (science) education, efforts ought to be accelerated. If education is going to contribute to the needed global transformations, the scholarly community needs to generate more evidence about “what works” for teaching and learning (evidence-supported practices), and this requires robust assessment tools. As we briefly touched on, sustainability (science) education researchers need to draw much more heavily on work being done in other education research fields. These efforts should extend beyond just the research perspective to include coordination across the relevant parties. Researchers, for example, need to focus on linking outcomes to the actual learning processes, while instructors may emphasize the formative aspect, and program directors be concerned about objective and comparable measures for reporting. In these efforts, there is a need for innovative assessment approaches that more directly prepare students for their professional paths and the challenges they will be facing.