Introduction

The overriding goal of this study is to provide a positive impulse for quality assessment of landslide susceptibility analysis by focusing not only on the calculation, but also on the evaluation process. We want to illustrate that the validation procedure of landslide susceptibility maps is not a burdensome obligation, but a chance to increase the reliability, transparency and—last but not least—the acceptance of such maps. We had the concern that quantitative validation scores of landslide susceptibility maps may in certain cases obscure the predictive accuracy. Therefore, the study aimed particularly at evaluating the predictive performance of landslide susceptibility maps retrospectively, and relating the results to validation practice.

Worldwide, 300 million people and an area of 3.7 million km2 are prone to landslides (Dilley et al. 2005). 2620 fatal landslides occurred in the period from 2004 to 2010, claiming 32,322 victims (Petley 2012). The annual global damage from landslides is estimated at 18 billion euros (Haque et al. 2016). In Europe, 476 deadly landslides claimed a total of 1370 fatalities in the period from 1995 to 2014. The average annual damage is estimated at 4.7 billion euros in Europe (Haque et al. 2016) and 221 million euros in Germany (Klose et al. 2016). It is expected that the number of fatal landslides will further increase. The reasons discussed are thawing of permafrost (Gruber and Haeberli 2007; Etzelmüller and Frauenfelder 2009), the increasing number of heavy-rainfall events (Crozier 2010; Jakob and Lambert 2009), population growth, and a change in land use (Glade 2003; Huppert and Sparks 2006; Petley 2010; Petley et al. 2007; Promper et al. 2014).

Developing countries are particularly affected by landslides, especially in Asia and Central and South America (Petley 2012). Here, the frequent occurrence of severe landslide events due to detrimental geo-environmental settings coincides with insufficient landslide mitigation strategies (Eder et al. 2009; Anderson et al. 2011). An efficient landslide risk management is often impeded by various challenges to local authorities, such as limited financial means or regions which are difficult to access (e.g., Rozos et al. 2011). At the same time, the development of powerful Geographic Information Systems (GIS), an ever-increasing resolution of remote sensing data as well as increasing computing power open new perspectives in landslide predictions. In this context, statistical methods moved increasingly into the scientific focus to analyze landslide susceptibility, especially in landslide-prone areas in developing countries (e.g., Bousta and Ait Brahim 2018; Lee et al. 2018; Nsengiyumva et al. 2018; Awawdeh et al. 2018; Tasoglu et al. 2016). Statistical methods are applicable on a regional or even national scale with comparably low financial, temporal, and personal expenses (e.g., Yesilnacar and Topal 2005). Thus, the number of statistical landslide susceptibility investigations continuously increased during recent years (Reichenbach et al. 2018). However, statistical analyses of landslide risks still attract little attention from authorities and regulators (Petschko et al. 2014). This can be explained by the sophisticated calculation processes, and also by the fact that susceptibility maps can be responsible for restrictions during planning processes.

To build up trust in statistical landslide analyses, it is crucial to demonstrate the quality of the final susceptibility map by a transparent validation. The standard evaluation approach is to analyze the agreement between the resulting map and observed data (Corominas et al. 2014). The observed data comprises the absence or presence of landslides in the investigation area of an independent test sample (Frattini et al. 2010). The most frequently applied validation techniques are cutoff independent methods such as Receiver-Operating Characteristics (ROC) or Prediction Rate Curves (PRC) and Success Rate Curves (SRC) (Corominas et al. 2014; Frattini et al. 2010; Reichenbach et al. 2018). Despite the widespread application, several authors question the reliability and accuracy of the standard validation practice in statistical landslide prediction. According to Bell (2007), Brenning (2005), and Reichenbach et al. (2018), several authors evaluate the predictive performance of the model only based on a training dataset. However, this gives only information about the goodness of fit of the model, and not on the actual predictive performance of future landslide occurrence. Kalantar et al. (2017) further state that the way of partitioning the landslide inventory into test and training datasets highly influences the prediction results. According to Bell (2007), temporal partition can lead to inaccuracies in the case of a homogeneous investigation area. The same applies to landslide reactivations in the case of a spatial partitioning. Steger et al. (2016a, b) concluded that the validation of a susceptibility map is also highly influenced by a bias in the landslide inventory. The result is that geomorphologically implausible landslide susceptibility maps can achieve very high validation scores (Steger et al. 2017). According to Neuhäuser and Terhorst (2007), Corominas et al. (2014), Guzzetti (2006), and Reichenbach et al. (2018), the most reliable way to assess the predictive performance of a landslide susceptibility map is to evaluate the predictive power retrospectively. This means that the predictive performance is validated based on a landslide event that occurred after making the map. Since the evidence of the validity has to be supplied together with the final map, this retrospective approach is only applicable in addition to the standard evaluation approach after a sufficient number of landslides has occurred after the creation of the map. However, compared to the standard validation approach proposed by Chung and Fabbri (2003), the retrospective evaluation is characterized by a higher level of objectivity, accuracy and comprehensibility.

To the best of our knowledge, no attempt has been made so far to assess the quality of statistical landslide susceptibility maps retrospectively. The objective of this study is, therefore, to demonstrate this validation approach in a landslide-prone area in South Germany called the Swabian Alb. The study is carried out by evaluating the predictive performance of four different landslide susceptibility maps, published before the year 2013, against an event with several landslides which occurred in 2013. The performance of each map is qualitatively analyzed and differences in the predictive accuracy are evaluated. The results are discussed with respect to the current validation practice applied to 50 recent landslide susceptibility investigations. Finally, we suggest measures to further increase the reliability of and reliance on statistical landslide investigations.

Study area

The study area is located south-east of the city of Tübingen at the foot of the Jurassic escarpment of the Swabian Alb, a low mountain range in South West Germany (Fig. 1). Different types of mass movements are widespread across this region and were the subject of numerous investigations (Schädel and Stober 1988; Dongus 1977; Bibus 1999; Terhorst 1997, 2001; Kraut 1999; Bell 2007; Kreja and Terhorst 2005; Terhorst and Kreja 2009; Thiebes 2011). The total number of landslides is estimated to be 30,000 for the entire Swabian Alb (Bell 2007). With an affected area of 0.6 km2, the most famous and best-documented landslide occurred in Mössingen in 1983 (Terhorst 2001; Schädel and Stober 1988).

Fig. 1
figure 1

Location of the study area at the Jurassic cuesta escarpment of the Middle Swabian Alb

Even though landslides are a danger, several residential areas were developed at the landslide-prone foot of the Swabian Alb escarpment (Terhorst and Kreja 2009; Blöchl and Braun 2005), and therefore, there has been damage to property (Kreja and Terhorst 2005; Sass et al. 2008). The area of Mössingen–Öschingen, a town located in the Middle Swabian Alb south of the city of Tübingen is particularly affected. To identify vulnerable residential areas, several authors have analyzed landslide susceptibility in this area (Thein 2000; Bell 2007; Neuhäuser and Terhorst 2007,2009; Terhorst and Kreja 2009). Figure 2 illustrates the study area of Mössingen–Öschingen as well as the extent of past landslide susceptibility studies. These studies differ in terms of the applied method, the size of the area, and the kind and resolution of the input data (Table 1).

Fig. 2
figure 2

Study area Mössingen–Öschingen in the Middle Swabian Alb. Past susceptibility studies are marked by the dotted lines, landslides triggered by the 2013 heavy-rainfall event are highlighted in red (base map from Google Maps, 2017)

Table 1 Landslide susceptibility studies in the area of Mössingen–Öschingen in the Middle Swabian Alb

In June 2013, a period of long-lasting heavy rainfalls triggered about 20 landslides along the slopes of the Middle Swabian Alb. Using aerial photographs, field trips and newspaper articles, 5 landslides have been identified in the investigation area of Mössingen–Öschingen (Figs. 2, 3b–d). One landslide hit the “Landhaussiedlung”, a residential area with 30 houses built in the 1960s (Fig. 3d); where three houses had to be demolished after the event (Fig. 3c) with a total damage estimated at several million euros. The largest landslide occurred in the south of the study area (Buchberg, Nr. 4 in Figs. 2, 3b). Although, in this case, no residential area was affected, the estimated loss of mainly agricultural land was estimated at about 100,000 euros (LGL 2014). The previous landslide susceptibility investigations in the study area were performed prior to the landslide event in 2013. Hence, this event can be used to assess the predictive performance of these previous studies retrospectively.

Fig. 3
figure 3

a Mössingen landslide of 1983. b Largest landslide in the 2013 heavy-rainfall event located south of the town of Mössingen (Fig. 2: No. 4). c, d 2013 landslide event devastating the Öschingen residential area “Landhaussiedlung” (Fig. 2: No. 1). Photos: a A. Dieter in Kreja and Terhorst (2009), b, d Klaus Franke, c Paul Fleuchaus

Methodology

Workflow of establishing landslide susceptibility maps

Landslide susceptibility represents the identification of areas liable to be affected by landslides, given by a set of geo-environmental conditions (Guzzetti 2006). The fundamental principle behind landslide susceptibility analyses was initially stated by Brabb (1991): “The past and the present are the key to the future.” Hence, future landslides will be more likely to occur under geo-environmental settings associated with past or present slope failures (Guzzetti 2006). To predict future landslide susceptibility, different approaches are used, such as heuristic, deterministic or statistical methods (Soeters and van Westen 1996; Guzzetti et al. 1999; Bell 2007). With a growing availability of high-quality geomorphological data, with increasing processing power and powerful GIS, statistical methods have gained importance and have been the most frequently applied approaches during recent years. Statistical methods are characterized by the ability to investigate large areas with little material effort and relatively low investment in time compared to, for example, extensive site exploration with core drilling and laboratory experiments to calculate the stability of many slopes. Figure 4 illustrates the typical workflow of a statistical landslide analysis.

Fig. 4
figure 4

Typical workflow of a statistical landslide analysis (TWI: Topographic Wetting Index; AUC: area under curve)

The initial step in statistical susceptibility investigations is the creation of a landslide inventory containing all past events of the analyzed landslide type (step 1). Second, factor maps are made for all factors affecting landslide susceptibility in the study area (step 2). Both factor maps and landslide inventory must be digitized and integrated into a GIS (step 3), and landslide susceptibility is calculated (step 4). The way of calculating landslide susceptibility depends on the chosen statistical method. In the past, various statistical methods were proposed, reviewed, and discussed by Guzzetti (2006), Bell (2007), van Westen et al. (2003), Carrara (1993), Guzzetti et al. (1999) and Reichenbach et al. (2018).

The reliability of a landslide susceptibility map depends on several factors, such as the quality and completeness of input data, expert knowledge, and investigation scale. Hence, a comprehensive quality assessment in the form of a validation is indispensable. Without this validation, the modeled maps have limited use (Chung and Fabbri 2003; Bell 2007). The cornerstone of most validation approaches is to split the landslide inventory into a training and a test dataset (step 5) (Corominas et al. 2014). The training dataset is used to create the susceptibility map. The test dataset (“unknown” landslides) is used to assess the predictive performance of the created susceptibility map (step 6). The landslide inventory can be split into training and test data based on three criteria: space, time and random partitions (Chung and Fabbri 2003). To assess the validation score, the most frequently used methods are the Receiver-Operating Characteristics (ROC) and the Prediction Rate Curve (PRC) (Corominas et al. 2014; Reichenbach et al. 2018). The area under the curve (AUC) can be used as a metric to evaluate the quality of the model.

Retrospective evaluation of landslide susceptibility maps

Several authors proposed that the easiest and most reliable evaluation strategy is to “wait and see”, if the calculated maps predict future landslides correctly or not (Corominas et al. 2014; Neuhäuser and Terhorst 2007; Guzzetti 2006; Reichenbach et al. 2018). However, this idea is usually not feasible, as the validation is requested together with the calculated map. Nevertheless, when considering susceptibility maps published in the past, this “wait and see” strategy is indeed viable, as the “waiting part” is already done. To proceed with the validation (“seeing” part), the only requirement is a set of landslides that occurred after the map was created. These landslides have to assess the predictive performance of the model based on a real dataset. The advantages of a retrospective evaluation compared to the standard practice can be summarized as follows:

  • A high level of objectivity, since the retrospective evaluation can be performed by a person other than the producer of the map;

  • The landslide inventory used to evaluate the results is independent of the inventory used to calculate the map;

  • The landslide inventory used to evaluate the map can be of higher accuracy, as recent landslides can be more precisely mapped than older landslide events;

  • The reliability of the map can be demonstrated in a more plausible manner, which helps to gain acceptance of such statistical methods.

This can be done in a qualitative or quantitative manner. A qualitative validation rests on a knowledge-based, subjective judgment by the researcher; for the quantitative approach, susceptibility maps and the landslide polygons can be compared by calculating the precision and sensitivity to evaluate the performance of the maps. Due to the lack of raster information, past susceptibility maps of the area of Mössingen–Öschingen are qualitatively validated in this study. The validation is based on five landslides from the heavy-rainfall event in 2013 (Fig. 2). It is evaluated whether the landslides are located in the zones which were attributed with the highest susceptibility values by the analyzed maps.

Results and interpretation

Review of validation practice

According to Reichenbach et al. (2018), more than one-third of all studies did not perform a validation. However, with the study by Chung and Fabbri (2003), the share of properly validated studies gradually increased (Bell 2007; Guzzetti 2006). Figure 5 overviews AUC values of 50 randomly sampled recent statistical landslide investigations as well as the applied statistical method. There is no correlation between the validation score and the statistical method applied. This confirms findings from previous studies that the predictive performance depends rather on the quality of the input data than on the used method (Steger et al. 2016a, b, 2017; Petschko et al. 2014).

Fig. 5
figure 5

Area under Curve (AUC) values of Receiver-Operating Characteristic (ROC) curves, Success Rate Curves (SRC), and Prediction Rate Curves (PRC) from 50 peer-reviewed, statistical landslide susceptibility analyses

At the same time, most studies (73%) achieved an AUC value of at least 80%. Poor validation scores of less than 70% were not observed. Thus, either statistical methods consistently predict landslide occurrence very accurately, or ROC and PRC do not reflect the actual predictive performance correctly in all cases. Considering the impact of inaccurate input data, lacking expert knowledge or the over- and underestimation of causative factors, more attention should be paid to the fact that not only the creation of the landslide susceptibility map, but also the validation itself is prone to several inaccuracies and errors. These inaccuracies are not only attributed to the division of the dataset into test and training data (Chung and Fabbri 2003; Bell 2007; Kalantar et al. 2017), but also to the quality of the landslide inventory. The creation of the landslide inventory is usually the most time-consuming, subjective, and error-prone part in landslide susceptibility analysis (Carrara 1993; Mondini et al. 2014; Bell 2007; Galli et al. 2008; Guzzetti et al. 2000; Brardinoni et al. 2003; Petschko et al. 2014; Santangelo et al. 2015). Even though most recent studies performed a validation, the quality of the landslide inventory is often not scrutinized (Steger et al. 2017). However, validating susceptibility maps with an erroneous landslide inventory only gives information about the goodness of fit of a model, and results in misleading conclusions about the actual predictive power regarding future landslides. We, therefore, agree with Steger et al. (2016a, b, 2017) that high AUC values are necessary, but not sufficient to prove the ability of a model to predict landslide occurrence.

Retrospective evaluation of landslide susceptibility maps

Figure 6 presents the susceptibility maps of past statistical landslide investigations in the area around the two towns of Mössingen and Öschingen. The landslides caused by the heavy rainfalls in 2013 are marked in blue. The exact location of each susceptibility map in the study area is illustrated in Fig. 2. The following section analyzes the susceptibility maps with respect to the predictive power of the landslide events of 2013 and discusses causes for the differing results. It is important to note that the susceptibility maps are not based on the same susceptibility classes since the original classification ranks were not standardized. The analyzed maps also differ in terms of the size and resolution of the input data.

Fig. 6
figure 6

Digitized susceptibility maps by a Bell (2007), b Neuhäuser and Terhorst (2007), c Thein (2000), and d Terhorst and Kreja (2009). Please note that the susceptibility classes of the original maps were not standardized. Landslides in the heavy-rainfall event in 2013 are marked in blue (a, b, c: base map from Google Maps, 2017)

Bell (2007) calculated landslide susceptibility based on a logistic regression (LR) for the entire Swabian Alb (on a regional scale). Two of the landslides in 2013 are located within the map (Fig. 6a). Both landslides are located along slopes predicted as being highly susceptible to landslides. Compared to the other maps, however, nearly all semi-steep slopes of the escarpment were classified as highly susceptible to landslides. This can be explained by the fact that the calculations are mainly based on geomorphological factors. Geological factors such as the lithology, which have a significant impact on landslide activity (Kraut 1999; Terhorst 1997; Thein 2000; Kallinich 1999), were not considered (Bell 2007; Terhorst and Kreja 2009). The susceptibility map provided by Bell (2007) was validated by ROC with an AUC between 85 and 98%, which equals an excellent–outstanding predictive performance (Hosmer and Lemeshow 2000). However, considering the large study area, Bell’s (2007) susceptibility map is more suitable for locating landslide-prone areas in the Swabian Alb than predicting single landslide events.

Neuhäuser and Terhorst (2007) used the method weights of evidence (WoE) to calculate landslide susceptibility for the Middle Swabian Alb on a regional scale. Five of the landslides in 2013 are located within the study area (Fig. 6b). Three of them are within the highest susceptibility classes, whereas two of the landslides are located along slopes that were classified by a medium susceptibility. Especially the devastating landslide at the “Öschinger Landhaussiedlung” (No. 1) was predicted very precisely. Compared to Bell (2007), one noticeable difference is the significantly smaller share of the highest susceptibility class. This can be explained by the smaller investigation scale and also by the integration of geological factors. The predictive performance is confirmed by an excellent validation score. The map was validated by the PRC and achieved an AUC value of > 90%.

Thein (2000) investigated landslide susceptibility on a local scale using LR. In this study, three of the 2013 landslides are located in the map. The landslide on the south side of the Farrenberg (No. 5) was well predicted (Fig. 6c). In contrast, the model showed poor predictive performance for both landslides (No. 3, 4) in the south-eastern part of the study area. All slopes in this area were calculated as stable even though the largest landslide occurred here. It is also interesting that very flat areas with a slope angle below 5° were classified as being highly prone to landslides. No validation was performed to verify the calculated susceptibility map. Of all maps analyzed, the study by Thein (2000) showed the lowest predictive accuracy. Residential area development planning based on this susceptibility map could have disastrous consequences. A qualitative assessment of the map would have also revealed that parts of the calculated probabilities are even geomorphologically implausible.

In contrast to the previously discussed studies, Terhorst and Kreja (2009) did not use a statistical, but a deterministic (infinite slope) approach based on hydrogeological and geomorphological data (Fig. 6d). Two of the landslides in 2013 are located within the study area. The occurrence of the small landslide (No. 2) in the south-eastern part of the study area was very well predicted by the model. This is also the case for the upper part of the landslide-affected area (main scarp and head) at the Öschingen residential area “Landhaussiedlung”. However, the foot of the landslide body was labeled as stable. This can be explained by the applied model that mainly focuses on shallow and translational landslides. The susceptibility map was only validated in a qualitative way.

In summary, the largest landslide No. 1 was located in three of the four analyzed susceptibility maps and was classified in all of them as highly susceptible. The second landslide was also located in three study areas, but only two maps were able to predict it as highly susceptible. Landslides number 4 and 5 were located in two study areas each. However, only one map predicted them correctly. Thus, the analysis of past landslide susceptibility investigations in the area of Mössingen–Öschingen revealed significant differences in the predictive accuracy of the 2013 landslide events. This emphasizes the great importance of a sound validation, especially since only two of the four investigated studies validated their results. The discrepancies in predictive performance are not only attributed to different investigation scales, but also to the comprehensiveness and quality of the used input data.

Discussion

Our brief review of validation practice showed that virtually all published validations of landslide susceptibility maps reached high validation scores. In contrast, the retrospective validation presented in this study revealed significant discrepancies in the predictive accuracy of the four analyzed susceptibility maps with respect to the 2013 landslide event. From this, we conclude that high validation scores are necessary, but not sufficient to validate the predictive power of such maps. Of course, a retrospective evaluation cannot be provided together with the (just) developed susceptibility map. In this case, scrutinizing maps by local site exploration and deterministic slope stability analysis at representative sites could minimize this limitation. However, in the cases where landslides occurred after the publication of a map, retrospective evaluation is clearly worth the effort to illustrate their validity and reliability (if the map was able to satisfactorily predict landslides) or improve them (if not).

Even though the retrospective analysis allowed us to draw key conclusions, it is important to consider this study as a demonstration. This mainly concerns the small size of the landslide inventory used for the retrospective evaluation, allowing only a qualitative evaluation instead of a statistically robust analysis of the susceptibility maps. This highlights the bottleneck of the retrospective evaluation, as this approach is dependent on the occurrence of a significant number of landslides within the investigation area after the creation of the original map. While susceptibility maps are usually created for areas with high landslide activity, an increasing number of such datasets will be available over time. To evaluate the predictive power also quantitatively, future studies should also consider utilizing the original data of the analyzed maps to reduce accuracy losses in the digitizing process.

Despite the demonstration character of this study, the retrospective validation approach turned out to be a promising assessment tool. While the burden to achieve a high validation score at least subconsciously influences the validation of a landslide prediction, the proposed approach allows an objective evaluation. In addition, in case of an imprecise or incomplete landslide inventory, susceptibility maps still achieve very high validation scores despite a low predictive performance. Retrospective evaluation allows a clear statement of the quality of the susceptibility map regardless of the quality of the original input data. This is particularly the case as recent landslide events can be mapped very precisely. Furthermore, most evaluations of susceptibility maps are of statistical nature. However, this approach enables the evaluation based on site-specific information. This “easy to see” approach is especially beneficial when dealing with authorities and landowners.

Landslides are a present and well-known hazard at the Swabian Alb. Apart from the fact that early warnings from the scientific community were not considered, the root cause of the problem is economically motivated development strategies of building areas along the semi-steep slopes of the Swabian Alb in the past (Kreja and Terhorst 2005; Sass et al. 2008). In some cases, approval of such building areas should not have been issued by the regulatory authorities. Nonetheless, early warnings by Neuhäuser and Terhorst (2007), Terhorst and Kreja (2009), and Bell (2007) received hardly any attention, neither by local nor by the state authorities. The case of Mössingen–Öschingen illustrates that effective landslide mitigation strategies require close cooperation between scientific and governmental institutions. Authorities should be more open to novel ideas and technologies, while scientists are obliged to take care of a comprehensible and sound susceptibility assessment. The differing susceptibility maps in the study area of Mössingen–Öschingen have certainly not promoted decisive actions from any decision makers.

Although not analyzed in the present study, the retrospective evaluation approach could also be a helpful tool in other natural risk assessment, such as floods, tsunamis, and wildfires. The only requirement is that a damage event occurred after publication of the susceptibility map.

Conclusions

Contrary to early statistical landslide investigations, the share of studies published together with a validation has been significantly increasing during recent years. However, several authors still validate their results without splitting the landslide inventory, and the impact of the landslide inventory on the predictive accuracy is hardly ever considered. High validation scores may disguise high uncertainties in landslide prediction. We, therefore, suggest to not only evaluate the predictive performance in a quantitative way by validation scores, but also in a qualitative way that scrutinizes results by a critical plausibility check.

Despite the great advantages of statistical methods, local authorities still mainly rely on deterministic or heuristic predictions. In the case of landslides in Mössingen–Öschingen, little attention was paid to warnings from statistical landslide susceptibility maps. One important step to build up trust in statistical methods is to present the resulting maps with annotations on how to interpret the results. It is not sufficient to justify the validity by a high validation score. In addition, a measure for the impact of each causative factor on calculated susceptibilities based on a sensitivity analysis would highly increase the transparency of landslide susceptibility maps.

Until now, little effort has been made to assess the prediction rate of past landslide susceptibility maps. The retrospective validation approach presented in this study revealed significant discrepancies in the predictive accuracy of the crucial 2013 landslide event in Mössingen–Öschingen. Both totally incorrect as well as very precise predictions were made in the previous studies. This quality assessment not only helps to reflect on the predictive performance of a susceptibility map, but also on the creditability of the validation. We are convinced that the proposed retrospective evaluation approach can help to increase the level of acceptance of statistical methods in the decision-making process of policy planers, as the reliability of such maps can be demonstrated in an objective and vivid manner.