1 Introduction

Organizations and societies are increasingly challenged by complex and systemic disaster risk, in which leaders must undertake strategic decisions in conditions of high uncertainty. In order to increase the flexibility of response and preparedness to concurrent, interacting, interconnected, and cascading events, cultural adjustments are required in both research and practice (Pescaroli and Alexander 2018). Authors such as Linkov et al. (2013) and Helbing (2013) have highlighted the need for new approaches that could help the assessment process to evolve, in order to be more suitable for applications in multiple domains and disciplines. Since then, the state-of-the-art has evolved significantly in terms of both theory and applications. A critical step was the increased prominence given to the understanding of risk and complexity as described in the Sendai Framework for Disaster Risk Reduction 2015–2030 (UNISDR 2015). This international agreement promoted further developments at the interface between academia, research, and practice. For example, the United Nations Office for Disaster Risk Reduction (UNDRR) released a series of practical guides to support the implementation of consistent measurements and datasets, including, for example, the “Words into Action” guidelines on National Risk Assessment (UNISDR 2017), and the Resilience Scorecard for Cities (UNISDR 2017). Similarly, new standards on continuity management and organizational resilience, such as the ones by the British Standards Institution, International Organization for Standardization, and National Fire Protection Association included new elements such as those that assess interdependencies and cascading effects (BSI 65000:2014; NFPA1600:1900; ISO 22316:2017; ISO 22301:2019). This has improved the principles used in benchmarking.

This is a fast-evolving field because tools such as resilience-measurement scales are the most basic source of interaction between academia, practitioners, and policymakers (Cutter and Derakhshan 2019). The academic literature proposes a range of methodologies that are available for measuring disaster risk, resilience, vulnerability, or climate change impact practices, including choices of variables and measurements, depending on the field of reference (Frazier et al. 2013; Linkov et al. 2013; Birkmann et al. 2014; Twigg 2015; Beccari 2016; Kelman et al. 2016; Gentile et al. 2019). However, the need for better standardization of indicators and reporting scales faces limitations, in particular in terms of what should be measured, how often, and by whom (Cutter and Derakhshan 2019). For example, stakeholders working on interconnected risks have suggested the need for cross-disciplinary improvements in defining operational and decision-making thresholds (Pescaroli 2018). The billion gigabits available online via channels such as social media have raised further challenges to the achievement of reliable quality standards (Kostkova et al. 2003; Alexander 2015). Moreover, end-user oriented early warning systems, multi-criteria decision support systems, and the scenario-building methods are some of the applied research fields in which updated benchmarking can make a difference and can help manage complexity (Pescaroli and Alexander 2018; Cremen and Galasso 2020).

Despite the progress that has been made, the current body of literature seems to lack a simple-to-use model that could provide benchmarking between different forms of assessment. In other words, the state-of-the-art is extensive, but it is missing a simple alternative framework of reference for cross-disciplinary assessments in those contexts in which existing scales and standards may be unsuitable, challenging, or impossible to be applied. This article started with explicitly stating the real-world need that emerged in our research on integrating organizational resilience and earthquake early warnings in Latin America. Further evidence and justification of need came from the experience of collaborating with stakeholders in Europe and Japan on operational resilience to cascading and interconnected risk. In the following sections we briefly review the methodological background and propose a rationale for a new Likert scale based on a response model for the assessment of operational gaps, organizational resilience, and disaster reduction capacity, as reported in Sect. 3. Our principal goal is to propose a simple-to-use, replicable, and adaptable tool that could be used as a benchmark across studies for anchoring information and assessing it in the field of data collection against benchmarks in a context distinguished by low levels of knowledge, training, or awareness.

2 General Considerations on Rating Scales, Anchoring, and Benchmarking

Research and practice need reliable data. The literature has extensively discussed the role of quantitative and qualitative methodologies for collecting and analyzing evidence, with particular attention to the replicability of the data collection process (Bryman 2016). Similarly, planning and designing a study is an essential aspect of the development of mixed-method approaches (Creswell 2014). The common ground across the disciplines is the need to anchor questionnaires and surveys to realistic points of reference in relation to the context in which the data are collected (see, for example, Alexander 2015; Ahmed et al. 2016; Hernández-Moreno and Alcántara-Ayala 2017; Pescaroli 2018; Gentile et al. 2019; Yore and Faure-Walker 2019). In the field of business continuity, a questionnaire is often the most important tool for gathering information in order to conduct a business impact analysis and assess the level of organizational resilience (BSI 2014; ISO 2019). Similarly, questionnaires are used to identify possible “performance gaps” or “capacity gaps” that could be reflected in the existence of perceived inadequate or undesirable organizational or operational states (Channon and Sammut-Bonnici 2014; UNISDR 2017).

There are two common steps to consider in this process, independent of the field or discipline. First, the questions have to be carefully thought through to assure their consistency and clarity, and to be safeguarded, as much as possible, against bias (Creswell 2014; Bryman 2016). Secondly, each answer has to be anchored to a rating scale, which can be defined as “a closed-end question whose answer alternatives are graduated or organized to measure a continuous construct, such as an attitude, opinion, intention, perception, or preference” (Peterson 2013). At this point, scholars, researchers, and practitioners have to consider a scale of reference, which may be straightforwardly to use a Likert scale (Croasmun and Ostrom 2011; Bryman 2016). The next step would involve looking at benchmarking examples such as those given by Brown (2010) and Vagias (2006). Something similar may happen in fieldwork, for example, when it is necessary to assess emergency response capacity, develop a business impact analysis, or benchmark organizational resilience at large (Alexander 2000, 2015; BSI 2014; ISO 2017).

However, in some cases, contextual limitations would suggest the need to shift the analysis to an approach that differs from the most commonly used scales. Indeed, in some fieldwork, Likert scales from 1 to 5 or 1 to 7 are simply not fit for the local cultural or social environment, or they cannot be reported in labels that use ordinary language or common idioms. Similarly, it is important to consider the qualitative and explanatory nature of the options available in the questionnaires that have to be considered complementary to the numerical values provided in the scales: different interpretation of the numbers can generate some confusion among respondents if not supported by a qualitative rationale, as for example was identified by the National electronic Library for Health project assessing the “quality” of evidence base (Kostkova et al. 2003; Wiseman et al. 2008). For instance, the process of collecting data about capacity, impacts, or preparedness in operational contexts (such as local councils) may be hindered by lack of resources, or failure to embed the process. Thus, anything more complex than what is strictly necessary might not be welcome. In such cases, the solution could be to implement scales from 1 to 3, or use binary answers such as “yes or no.” However, this approach risks oversimplification, which might compromise the outcome of the project (Croasmun and Ostrom 2011; Bryman 2016). Our experience in the field and our dialogue with stakeholders suggest the need to move to a “hybrid” approach that prioritizes clarity, replicability, and consistency across different sectors. The point of the next section is to introduce a 0–3 scale, which could be more easily accessible to respondents, while it reduces the possibility of bias in responses and assures consistency across different measurements and research domains of disaster risk reduction, operational capacities measurement, and organizational resilience.

3 A New Scale-Based Assessment Model

A novel Likert scale-based assessment model for measuring organizational resilience and gaps in operational and disaster risk reduction capacities is reported in Table 1. It is organized according to the following structure.

Table 1 A Likert scale-based response model for benchmarking gaps in operational capacity, organizational resilience, and disaster risk reduction capacity

3.1 Value

The numerical value chosen is from 0 to 3, with the additional inclusion of the category “don’t know” as a possible option. The value of the primary assessment criteria is reported in both the disaster resilience scorecard (UNISDR 2017), and some examples of Likert scales offered by Brown (2010) and Vagias (2006). While developing this value scale, we recognized that answers may be affected by high uncertainty and low knowledge, and responders may have the tendency to choose the middle value in the ranking. As suggested in the review by Croasmun and Ostrom (2011), the risk of a biased response could be mediated by providing a neutral response option, allowing respondents not to take an opinion if they do not have one. This risk was mediated in our scale by including the category “don’t know” as an additional choice. Finally, the numerical value is supported by a visual association with the standard “traffic light color scheme” that is also used in the dissemination of warnings in order to harmonize the meaning of the numbers for people in different backgrounds and disciplines (Kostkova et al. 2003; Wiseman et al. 2008).

3.2 Category Labels

This section provides some generic examples of descriptive attributes that could be used to develop qualitative answers and ratings that are complementary to the numerical values of the scale. This has been intended as a partial and synthetic list that is derived from the descriptive categories reported by Brown (2010), Vagias (2006), and from a review we carried out of the attributes in the categories used by UNISDR (2017) and BSI (2014). In other words, the proposed scale provides enough evidence to be taken as a reference, independent of the question that has been formulated. It does not pretend to be comprehensive.

3.3 Gap Outcome

The gap outcome in Table 1 is intended as a support for the consistency of the outcomes and results of gap analyses, that are defined in this short article as the processes of assessing and comparing one organization’s objectives and their expected outcomes to understand possible differences in the performance that has been delivered (Channon and Sammut-Bonnici 2014). This is presented in terms of “inadequate/adequate” or “undesirable/desirable” operational or organizational states (Watkins et al. 2012). First, the states are identified according to the numerical values obtained on the scale. Second, they are anchored to qualitative examples of the Likert scale such as those given by Brown (2010) and Vagias (2006). They can also be anchored to percentages reported in the resilience scorecards by UNISDR (2017). In this case, the numerical thresholds must be considered as generic reference points, and they need to be grounded in a specific context, which may either confirm or challenge their validity.

3.4 Capacity Levels

This label provides a qualitative description of the capacity levels that have been derived from the maturity model integration (CMMI) of Chrissis et al. (2003). This is integrated into the disaster resilience scorecards approach (UNISDR 2017). Capacity is intended as “the combination of all the strengths, attributes and resources available within an organization, community or society, to manage and reduce disaster risks and strengthen resilience” (UNDRR terminology, updated 2017).Footnote 1

3.5 Resilience Levels

The resilience levels have been simplified and adapted from the corresponding maturity levels reported in BS65000:2014 (BSI 2014), shifting from a 0–5 scale to a simplified 0–3 scale. In the present work, the term “resilience” is defined as “the ability of an organization to absorb and adapt in changing environment to enable it to deliver its objectives and to survive and prosper” (ISO 2017, p. 4).

4 Conclusion

Our scale provides a replicable, direct model for benchmarking answers, built on the real needs of stakeholders and the feedback obtained from the field. However, like other tools and methodologies that represent the current state of the art, it has some characteristics and limitations that need to be taken into account if it is to be applied effectively. It should be noted that the scale has been developed to provide a flexible assessment for contexts in which there are constraints that prevent one from conducting a wider and more complex analysis. In other words, the rationale of our approach is to maximize the reliability of answers by sacrificing, to some extent, the level of detail when anything more sophisticated is not achievable. This implies that the achievable level of accuracy of the results is lower. In this, there are two complementary considerations. In the first place, ideally, our scale should be used as a basis for developing a more complex form of analysis. In the practice of assessing resilience, this may be derived from guidelines such as those mentioned previously (BSI 2014; UNISDR 2017; NFPA 2019). In multidisciplinary research, optimal results may require broader scales, such as one in the range 1 to 7 or 1 to 10 (Croasmun and Ostrom 2011). Second, a possible alternative to compensate for the reduction in accuracy of the scale when compared to alternatives may be to plan carefully the integration of questionnaires with semistructured interviews or focus groups. This could profitably follow guidelines for mixed-method research such as those provided by Bryman (2016).

In conclusion, our approach does not pretend to be exhaustive, but it provides a practical and flexible reference method for benchmarking that can be adapted to the context in which it is used. For example, in some cases, category labels may require both language translation and changes to reflect sensitivity to cultural variations in the use of terminology. Instead of a limitation, these elements could be seen as an additional strength that could be used to promote testing and further evolution of our model, according to new experiences derived from research and practice. The result could be extended to different disciplines.