1 Introduction

Rolling stock introductions deal with complex railway systems that comprise interacting digital, analog, physical, and human components engineered for safe and reliable railway transport. New rolling stock is characterized by an increasing convergence of information technologies and operational technologies, also referred to as ‘next generation trains’. This enables autonomous driving, new functionalities to achieve higher capacity, greater safety, and real-time health monitoring. The introduction of new rolling stock in an already complex railway system is a big challenge for railway operators as it involves many different business units and organizations in different stages of the introduction. An introduction takes several years to complete and has to deal with political influences, changing stakeholder demands, new technologies, and technical constraints which cannot be fully predicted in advance. Unfortunately, there is no single solution to overcome these challenges. Therefore, one can expect surprise events and, if not managed properly, fragile railway systems.

The theory of graceful extensibility has recently been introduced as the opposite of brittleness and can be defined as the ability of a system to extend its capacity to adapt when surprise events challenge its boundaries (Woods 2015). It provides a set of basic rules that govern adaptive systems. Its ideas and concepts have been introduced by Woods (2018) as proto-theorems, but, as suggested by Woods (2018), need further empirical testing. This study is a first attempt to assess this new theory and its usefulness in coping with complex cyber-physical systems. Its contribution lies in exploring the explanatory power of the proto-theorems of graceful extensibility in an in-depth historical case study into a railway rolling stock introduction using pattern-matching analysis (Trochim 1989). Pattern matching analysis involves the specification of a theoretical pattern, the acquisition of an observed pattern, and an attempt to match these two (Trochim 1989). Rolling stock introductions can be considered as the introduction of complex cyber-physical systems, which take, on average, 5 years to complete. By selecting the Fyra V250 case (V250), which has already been subject to several evaluations and reflections [see, for example, Silfhout and Berg (2014)], the authors attempt to identify patterns that may have resulted in failed sustained adaptability, but can provide practical guidance to future rolling stock and other critical asset introductions. The main focus of this study is on human factors and the decision-making processes on different organizational levels within and between organizations involved in the introduction process.

The remainder of this paper is structured as follows. Section 2 briefly introduces the theory of graceful extensibility and connects it to current Industry 4.0 challenges. Section 3 explains the research approach and summarizes the pattern-matching analysis technique. After the case introduction in Sect. 4, case results are presented in Sect. 5. This section concludes by stating the (mis)matches of the patterns of graceful extensibility and provides possible explanations, supported by relevant literature. As this study is part of a research project aimed at increasing reliability in rolling stock introductions, Sect. 6 includes the main conclusions and possible future research directions.

1.1 Dealing with surprise events in critical asset introductions by means of graceful extensibility

Industry 4.0 is currently a much-discussed topic that has the potential to affect entire industries by transforming the way goods are designed, manufactured, delivered, and paid for. The rapid adoption and application of pervasive digital technologies in several industries not only radically changes products and services, but also fundamentally reshapes organizations (Yoo et al. 2012). Hermann et al. (2016) identified four industry 4.0 components based on their review of academic and business publications: the concepts of cyber-physical systems (Akanmu and Anumba 2015), the Internet of things (Porter and Heppelmann 2014, p. 4), and the Internet of services (Andersson and Mattsson 2015) are closely linked. These concepts enable the so-called ‘smart factory’, which is based on the idea of a decentralized production system, in which “human beings, machines and resources communicate with each other as naturally as in a social network” (Kagermann et al. 2013, p. 19). However, as more heterogeneous modules, originally produced by diverse actors, are combined to create innovations, organizations increasingly run the risk of complex systemic failure or other forms of unintended consequences (Perrow 1984). This is also reflected in the observation made by Baheti and Gill (2011) who state that the diversity of models and formalisms in the development of cyber-physical systems at the component level poses a serious problem for verifying the overall correctness and safety of designs at the system level. Therefore, organizations should look for ways to deal with an increase in surprise events.

Projects dealing with complex systems, such as the introduction of new rolling stock, have certain characteristics that require consideration to be managed successfully. Understanding and dealing with surprise events and the unknown are a major challenge in project management. For example, Ramasesh and Browning (2014) present a conceptual framework for dealing with unknowns in project management. These unknowns could be foreseen but for various reasons (e.g., barriers to cognition) are not. Furthermore, in managing unforeseen events, Saunders et al. (2016) observed high reliability practices in their study [see, for example, Weick et al. (2008)] into safety–critical projects. However, these practices are often fragile in nature and dependent on key individuals. The concept of system resilience is another approach in dealing with complex systems. Four lines of inquiry were identified to capture different senses of resilience and reducing risks of sudden failures in complex systems (Woods 2015): rebound, robustness, graceful extensibility, and architectures for sustained adaptability. Previous research has shown that the effort invested to improve fitness, leads to systems that are robust to stressors they were designed to handle, yet fragile to unexpected events and design errors (Carlson and Doyle 2000, p. 2529). While improving the system regarding certain criteria, the same improvements produce severe brittleness when surprise events occur. Brittleness is defined as the rapidity of a system’s performance decline when it nears or reaches one or more boundary. Brittle systems experience rapid performance collapses, or failures, when events challenge boundaries (Woods 2015). The opposite of brittleness is Graceful Extensibility (GE), or how to extend adaptive capacity in the face of surprise events (Woods and Branlat 2011). In accordance with Woods (2018, p. 6), a surprise is here defined as: “Given bounds on adaptive capacity, there are events which will occur that fall near and outside the boundaries; thus, surprise is model surprise where base adaptive capacity represents a partial model of fitness”.

The theory of GE explains the contrast between successful and unsuccessful cases of sustained adaptability. Sustained adaptability refers to the ability to continue to adapt to changing environments, stakeholders, demands, contexts, and constraints (Woods 2018). The theory of GE is strongly linked to concepts in control systems. Control systems are in many ways a simple form of adaptive system, and theory specifies how to ensure stability (adequate adaptive performance) given well-defined targets and well-modeled disturbances. Graceful extensibility also is a play on the concept of software extensibility from software engineering. Software engineering emphasizes the need to design, in advance, properties that support the ability to extend capabilities later, without requiring major revisions to the basic architecture, as conditions, contexts, uses, risks, goals, and relationships change (Woods 2018).

Graceful extensibility is defined as the ability of a system to extend its capacity to adapt when surprise events challenge its boundaries (Woods 2018). See, for example, Wears et al. (2008) for how medical emergency rooms adapt to changing, and high, patient loads. At the heart of the theory of GE lies the fundamental concept of managing risk of saturation via regulating the Capacity for Maneuver (CfM), both at the level of an adaptive unit and at the level of a network where neighboring adaptive units interact as risk of saturation increases (Woods 2018). In prior research, Woods and Branlat (2011) identified three basic patterns of maladaptation. The three basic patterns are decompensation (lack of capacity to adapt when disturbances cascade), working at cross-purposes [local versus global (mal) adaptive behavior] and getting stuck in outdated behaviors when relying on past successes. The theory of GE is presented as 10 proto-theorem (S1–S10) divided into three subsets (Fig. 1) that express the fundamentals that govern adaptive systems. Proto-theorems in subset A (S1–S3) capture how the CfM is regulated to manage and reduce the risk of saturation. Subset B (S4–S6) addresses what is required in order for a layered network to sustain adaptability. It captures several basic processes which influence how adaptive units will act when a neighbor is at risk of saturation and whether units will act in ways that extend or constrict the CfM of the unit at risk. Subset C (S7–S10) captures how constraints, such as perspective bounds and mis-calibration of adaptive capacity, can be addressed. Expanding on the work on GE done by Woods (Woods 2018, p. 23), this study proposes to explore the explanatory potential applying pattern-matching analysis to the 10 statements of GE.

Fig. 1
figure 1

Connecting graceful extensibility (Woods 2018) to sustained adaptability in in critical asset introductions dealing with surprise events

1.2 Research approach

This study explores the explanatory potential of graceful extensibility using pattern-matching analysis. A systematic research design was adhered to in a single confirmatory descriptive case study (Yin 2003). Pattern matching analysis (Trochim 1989) at the very least involves the specification of a theoretical pattern, the acquisition of an observed pattern, and an attempt to match these two. What matters are the patterns of the outcomes, not the outcomes themselves. Trochim (1989, p. 357) describes pattern-matching techniques as distinct from the traditional hypothesis testing in that “pattern matching encourages the use of more complex or detailed hypotheses and treat(s) observations from a multivariate rather than a univariate perspective”. In case-study research, pattern-matching techniques are designed to enhance the rigor of the study; if the empirically found patterns match the predicted ones, the findings can contribute to, and strengthen the internal validity of the study, and can result in the confirmation of the propositions (Yin 2003). Furthermore, Yin (2003) emphasizes that, irrespective of design, data analysis using pattern matching is entirely appropriate for all case-study designs if its use is consistent with the purpose of the study and the research questions to be answered. Since qualitative research often lacks precision, an important suggestion is to avoid postulating very subtle patterns, so that pattern-matching results deal with gross matches or mismatches whose interpretation is less likely to be challenged (Yin 2003). Several factors need to be considered in the research design when using pattern-matching analysis as described by Trochim (1989, p. 357). These factors are: conceptualization of the theory, the level of generalization, the value of reanalysing historical data, treating relevant data as a whole rather than a collection of individual outcomes, and the procedures required to provide evidence for a match.

The V250 case, which will be further introduced in Sect. 4, was selected as an example of failed sustained adaptability. Train services were canceled after 2 months in operation, after the introduction had already been delayed for several years. The case includes many data sources since a parliamentary inquiry was also part of the evaluation of the V250 (Parliamentary Inquiry Committee Fyra 2015). Consistent with the approach developed by Yin (2003), each data source was initially collected and analyzed resulting in 430 coded items (refer to Fig. 2). Sources (videos, published and unpublished reports, internal memos) were coded using the qualitative research software of Atlas.ti. The first step in pattern matching is developing a proposition prior to undertaking the study (Trochim 1989). A theoretical pattern is a hypothesis about what is expected in the data. The observed pattern consists of the data that is used to examine the theoretical model. To the extent that patterns match, one can conclude that the theory predicts the observed pattern and receives support. Based on the proposition which will be introduced in Sect. 5, focused coding resulted in 176 items. Data were analyzed on an organizational level (level of generalization) and theoretical patterns were selected in advance. The conceptualisation of the theoretical patterns was proposed by Woods (2018) and adopted in this study in the pattern-matching process.

Fig. 2
figure 2

Data analysis and pattern-matching process V250 case-study design based on Trochim (1989)

Following Fig. 2, based on the items arrived at by means of focused coding (176 items), empirical patterns were constructed from the case findings to compare it to the theoretical patterns as defined by the theory on GE. The sub-statements within each of the 10 proto-theorems were categorized as sub-patterns and compared to the empirical patterns identified in the case. In instances where patterns did not match, alternative explanations were explored and discussed with former key players in the V250 introduction. Part of the pattern-matching analysis (central in Fig. 2) has been included in the Appendices, so that readers have the opportunity to compare results for themselves. Furthermore, Sect. 5.2 includes a detailed example of the matching process in the case as presented in Fig. 2.

1.3 Case introduction

To gain an understanding of the V250 introduction and its context, this section summarizes the timeline of the introduction and highlights the context in which surprise events occurred. As summarized by Johns (2006), an understanding of a context contributes to an understanding of the entities embedded within that context. It affects the cognition, affect, and behavior of individuals embedded within it. Context influences processes and interrelationships between constructs, as well as the meaning that people ascribe to events or themselves.

The main objective of the V250 introduction was introducing a high-speed train on the high-speed railway networks HSL Zuid (Netherlands) and Line 4 (Belgium) to connect to existing high-speed railway networks in Europe. The introduction of the V250 was characterized by the introduction of new digital train systems (e.g., the European Railway Traffic Management System, ERTMS), which needed to be integrated with other mechanical systems. This has a great impact on the behavior of the joint human–machine system (Hollnagel and Cacciabue 1999) in meeting the demands from the environment and maintain control. The main stakeholders in the introduction consisted of private and public entities which included the railway operators, the suppliers, the governments of Belgium and the Netherlands, the authorizing bodies (supervisors), the Designated Body, the Notified Body, the infrastructure managers, and the maintenance supplier. This is not a complete list, but serves as an indication of the large number of stakeholders and their interest in the V250 introduction. Figure 3 indicates the timeline of the introduction, with a lead time of over 11 years.

  • Phase 1: Concession contract for high-speed railway line. In 1996, in the context of the liberalization of the European Railway industry, the Dutch government ‘privatized’ the main railway operator in the Netherlands (Nederlandse Spoorwegen, NS), but remained its sole shareholder. Furthermore, train and track systems were legally separated. This emergent market orientation has led, among other things, to a strongly legalistic approach to the construction of the HSL Zuid and the acquisition of the V250 trains. Furthermore, requirements imposed by the government to implement a new (unproved) European safety system (ERTMS) in a cross-border high-speed infrastructure network (HSL Zuid and Line 4), using innovative high-speed trains, increased complexity.

  • Phase 2: Tender process and acquisition of rolling stock. The imposed requirements for rolling stock limited the scope for maneuver in the highly regulated tendering process. Due to the small number of trains and the high development costs per train, only one candidate contractor remained. The Dutch and Belgian railway operators signed a turnkey contract with this supplier. A turnkey contract is one under which the contractor is responsible for both the design and construction of rolling stock, ready for commercial use at the agreed price and by a fixed date. The main reason for a turnkey approach in the purchasing agreement was to outsource risks as both railway operators had little experience in designing and constructing high-speed rolling stock. However, this restricted the opportunity to monitor (and influence) the design, construction, and testing processes, which prevented an early anticipation of issues regarding maintaining and operating the V250 trains.

  • Phase 3: Design, construction, and testing. An overly optimistic estimate of the delivery times beforehand resulted in unrealistic planning in all phases. Detailed timetables of deliverables by the contractor were lacking, and as a result, assumptions were made. Eventually, this resulted in a delay of five years in the delivery of products and services by the contractor (Parliamentary Inquiry Committee Fyra 2015, p. 5). Furthermore, testing was delayed due to a lack of clear (testing) requirements for ERTMS and a late delivery of the infrastructure of HSL Zuid. As the ERTMS system was in its early development phase, updates were prescribed each time. This led to a great deal of uncertainty and delay in solving technical problems to establish a working and certified security system for the HSL Zuid.

  • Phase 4: Homologation (validation and certification). The process of homologation took place during the construction of the V250. The process was complex, because homologation of the train had to take place both in the Netherlands and in Belgium. Additionally, the European Technical Specifications for Interoperability (TSI) also had to be considered. The authorized body of the Netherlands did not inspect any physical trains and relied solely on the findings of the Notified Body which also did not inspect all trains (Parliamentary Inquiry Committee Fyra 2015). Furthermore, the terms and conditions under which rolling stock was transferred from contractor to railway operator were unclear due to different interpretations of the purchase agreement.

  • Phase 5: V250 in commercial operations. Commercial operations of the V250 trains started on December 9, 2012 between Amsterdam and Brussels. On January 17, all V250 trains were removed from service due to an incident in which one of the V250 trains had lost a base plate due to ice formation and the continuing incidents with other V250 trains. The lack of timely communication of the introduction of a new train service to passengers and at the same time the cancelation of the existing Benelux line resulted in public outrage and high political pressure. Since unknown technical problems are one of the key characteristics of new rolling stock, reliable performance can never be guaranteed beforehand and a fallback scenario needs to be prepared. The so-called ‘teething problems’ can occur as a result of unexpected defects in the system in commercial operations due to technical, organizational, or human failures. As people, trains, and infrastructure are locally distinctive, testing or simulation may never prevent these (introduction) challenges completely.

Fig. 3
figure 3

Timeline v250 introduction (Parliamentary Inquiry Committee Fyra 2015)

1.4 Case results

This section presents the results of the case study using pattern-matching analysis. Two assumptions from the theory of graceful extensibility state that resources are always finite and change is ongoing. As a result, both risk and uncertainty are always present (Woods 2018). This requires Units of Adaptive Behavior (UABs) at multiple nested scales (e.g., processes, individuals, organizations, teams, and networks). The pattern-matching analysis in this study was performed on an organizational level. The unit of analysis was the V250 introduction, consisting of several UABs (operator, infrastructure manager, suppliers etc.) with different accountabilities and responsibilities, but all with the same final objective, safe, and reliable passenger railway transport. As defined in Sect. 3, surprise events are those events that fall near and/or outside the boundaries of the adaptive capacity of a system (Woods 2018). Figure 4 illustrates the operationalization of surprise events in the context of the V250 introduction. Surprise events which fall near the boundaries of the adaptive capacity of a UAB occur (Fig. 4: X). Other surprise events which fall outside the boundaries of a UAB occur and require extended adaptive capacity from that same UAB (Fig. 4: Y). Further surprise events may occur which fall outside the boundaries of a UAB, and these cannot be addressed by that same UAB and require extended adaptive capacity from a second UAB (Fig. 4: Z). Additionally, surprise events that fall outside the boundaries of the introduction system can occur (Fig. 4: Q). The case showed several surprise events on all nested scales. Some examples near the boundaries (X) of the NS were the daily disruptions which could be solved by operations themselves using standardized scripts. A typical example outside the boundary (Y) was the additional capacity for train drivers to ensure availability in case necessary. Examples of surprise events outside the boundary (Z) were the disruptions caused by failures in the railway tracks besides failures in rolling stock. This requires adaptive capacity from the infrastructure manager. Surprise events outside the boundaries of the introduction system (Q) were, e.g., the changing political agreements of the Dutch and Belgian governments or the changing legislation with regards to ERTMS. If the Capacity for Maneuver (CfM) is limited, the train system becomes brittle and performance decreases.

Fig. 4
figure 4

Surprise events in the context of the V250 introduction

In Sect. 5.1, the proposition is outlined. Following this, by comparing the theoretical outcome patterns, as put forward by the theory of graceful extensibility, to the empirical outcome patterns from the V250 case, (mis)matches were identified and will be presented in Sect. 5.2.

1.4.1 Preposition

The main proposition in this study based on earlier research of Woods (2018) was: complex rolling stock introductions can benefit from graceful extensibility to sustain adaptability as demands change as a result of surprise events challenging the boundaries of the system. If the assumption of the authors is correct, and similar patterns are found, the theory of graceful extensibility might also be applicable in other long-term complex critical asset introductions.

1.4.2 Pattern matching results

The theory of graceful extensibility entails the ability of a system to continuously extend its capacity to adapt when surprise events challenge its boundaries. It consists of 10 proto-theorems (S1–S10), categorized into three subsets as reported by Woods (2018). Following the research design as introduced in Sect. 3, the theoretical patterns of each proto-theorem were compared to empirically found patterns regarding sustained adaptability based on published and unpublished reports from the evaluation of the V250 introduction. The following three subsections summarize the qualitative results of the analysis using pattern-matching analysis. Detailed results have been included in the Appendices. Figure 5 shows an example of how the matching process was performed. The second column represents the theoretical sub-patterns and the fifth column shows the findings (coded items) including references to Atlas.ti. The third column translates the findings into an empirical pattern, which enables the match with the theoretical pattern. Based on this matching process, the fourth column states whether or not a match was observed.

Fig. 5
figure 5

Example of pattern-matching analysis for theoretical pattern S1

1.4.3 Subset A: managing risk of saturation (S1–S3)

Based on the assumptions that resources are finite and change is ongoing, the adaptive capacity of any unit at any scale is finite. Therefore, all units have bounds on their range of adaptive behavior. This is referred to as Capacity for Maneuver (CfM) (S1). Events which fall outside the bounds will always occur and demand response. Otherwise, the unit is brittle and performance may decrease (S2). As all UABs risk saturation of their adaptive capacity, they require some means to modify or extend their adaptive capacity when demands threaten their base range of adaptive behavior (S3). Based on focused coding of the dataset, several patterns were identified for sustained adaptability. These patterns were mapped to the first subset of proto-theorems of graceful extensibility, which consist of three proto-theorems and underlying patterns (refer to Appendix A for detailed results of the mapping).

  • S1 All units have bounds on their adaptive capacity: Results show that the V250 introduction involved different UABs (e.g., infrastructure manager, operator, supplier, supervisor, consultant, and governments) which all (need to) contribute to ensure a reliable railway system. Boundaries on adaptive capacity were identified on different levels: technical, cultural, political, and inter- and intra-organizational. A typical example was cross-border failures, involving train and track systems from two different countries (The Netherlands and Belgium), where close cooperation is required to quickly address technical failures. This cannot be addressed by one UAB alone. As such, the concept of CfM was not observed. Patterns from the case showed the tendency to embrace the prevailing assumption that everything becomes fluid under pressure, resulting in an (overly) optimistic perspective in managing future (technical) failures. Results showed traditional risk management practices to be in-control.

  • S2 Events will occur outside the bounds and demand response: The V250 shows that new rolling stock introductions are often characterized by ‘teething problem’ challenges. Therefore, reliability remained unpredictable as surprise events challenged the boundaries of the system. The CfM decreased as a result of ‘teething problems’ in both technical and organizational systems. The (fragile) interfaces between track and train increased the risk of brittleness when the system operated near its boundaries. Results reflected the attempt to gradually increase complexity during trial operations. Nevertheless, as previous research also states (Woods 2016), trial operations can never completely simulate commercial operations. Specific issues can only be identified after intensive use of the equipment in operations. An example of this is the TRAXX Amsterdam-Breda (April 2011), where failures suddenly emerged after a week in operation. V250 train sets had different failure modes, so each train can be considered unique. Just like the V250, new rolling stock will always deviate from existing rolling stock in both operations and maintenance and demands appropriate responses. As one of the engineers stated: “If we were stuck with current technologies, we would still use steam locomotives.”

  • S3 Units modify and extend adaptive capacity: V250 results indicated an increase in effort and resources when the CfM decreased. The need for extended adaptive behavior was partly acknowledged by introducing a helpdesk for train drivers, additional support on the platforms, more capacity in tracks, and increase in turning points. A typical example was the absence of alternative fallback options in case of unreliable performance of the V250. Results showed the need for extended adaptive behavior, but this was often restricted by fixed strategies and plans. Furthermore, results showed patterns that indicated a slower pace of finding, deciding on, and implementing solutions than was required to meet demand when disruption increased. Eventually, this resulted in a cancelation of all V250 train services as negative public opinion and political pressures mounted and the capacity to adapt was decreasing. Subcontractors were not involved in an early stage of the introduction process, which increased the risk of saturating CfM in a later stage. Furthermore, responses to standard failures depended on the train drivers involved and were mitigated by educating train drivers using standard solutions for standard failures.

    In summary, due to the nature of the railway system, UABs depend on each other when adapting to surprising events. In managing the risk of saturation of adaptive capacity, UABs in the V250 introduction modified their base adaptive capacity, but did not (fully) utilize the network for enabling extended adaptive capacity. Main inhibitors were the fixed strategy and associated implementation plans formalized in legal agreements. Due to a large number of (international) stakeholders and the legalistic approach, clear and timely agreements were lacking, which caused ambiguity and uncertainty among stakeholders during the introduction.

    Subset B: Networks of adaptive units (S4–S6).

    As shown in statements S1–S3, graceful extensibility depends on how one UAB interacts with neighboring units in a network of interdependent units (subset B, refer to Fig. 1). No single unit can have sufficient range of adaptive behavior to manage the risk of saturation by itself. Therefore, synchronization across multiple UABs in a network is necessary (S4). Units in a network can monitor and influence other units in the network. Therefore, the risk of saturation can be shared within the network (S5). While independent units pursue their own goals and objectives, UABs generate points of pressure on other UABs which causes UABs to search for better operating points (S6). Appendix B includes the detailed mapping of theoretical and empirical patterns based on the case results.

  • S4 Synchronization of UABs: The complexity of train and track, multitude of stakeholders, safety and security issues, political interests, large investments, major risks, and fragmented factual expertise required alignment and coordination during the V250 introduction. Findings illustrated the complex interdependencies between infrastructure and operator (Train Track Integration), which showed the need for alignment and coordination. Part of the problems with the V250 introduction were related to the availability of conventional and high-speed track, communications between train and track, and the response time in case of major disruptions. Findings also showed the possible limitations caused by the liberalization of the railway system, reflected in a lack of shared interests, willingness to share Capacity for Maneuver, and an increase in formalized interfaces, often confirmed by legal agreements.

  • S5 Risk of saturation can be shared: Findings show underlying patterns of collecting and sharing monitoring data for optimizing the system. Data were mainly collected and analyzed locally, increasing the risk of misalignment in the railway network. The ERTMS system required strong alignment and integration of operational systems of train and track for reliable communications. Fallback scenarios were not aligned among stakeholders, and the main contractor did not share all information regarding defects and failures to facilitate problem-solving. Findings show incompatible modes of operation among stakeholders during the homologation process. Case results also show the need for railway operators to involve subcontractors early in the introduction process for better alignment during and after introduction.

  • S6 UABs search for better operating points under pressure: Findings show the underlying patterns of network pressures on UABs. Pressures from commercial interests and the media caused by the incident in which one of the V250 trains had lost a base plate due to ice formation and the continuing disruptions of other V250 trains, led to the full cancelation of the V250 train services on January 17, 2013 (Parliamentary Inquiry Committee Fyra 2015). From the beginning of the introduction, stakeholders (public and private) did not align their interests (financial, competitive, and political), (in)formal agreements were lacking and pressure mounted continuously. Chosen design principles for rolling stock were effective for one stakeholder (the contractor), but ineffective for other stakeholders further down the introduction chain (maintenance and operations). Conflicts of interest existed as the government was the sole shareholder of the privatized railway operator (NS), but simultaneously promoted liberalization among railway operators due to the liberalization of the European Railway industry. Therefore, one could argue that the architectural principles of the railway system did not fully support alignment and coordination of UABs responding to varying pressures on trade-offs. For example, the Dutch government pushed for high financial gains as a shareholder, but at the same time also demanded highly reliable train services as defined by punctuality requirements in the concession agreement. The NS was focused on maintaining their strategic position in a competitive market as the main railway operator in the Netherlands (Parliamentary Inquiry Committee Fyra 2015), which contributed to the optimistic views in the business case to win the concession.

Although results from the case show a lack of alignment and coordination among UABs, the observed patterns correspond with similar ones found in other networks of adaptive units. As the results indicate, the main inhibitor for synchronization and sharing the risk of saturation among UABs was the lack of an integrated (holistic) perspective on the railway system (train and track) and the strict formal (legal and political) agreements between stakeholders (e.g., the turnkey approach in the purchase agreement as briefly described in Sect. 4).

Subset C: Outmaneuvering constraints (S7–S10).

Given the proto-theorems of networks of adaptive units, statements S7–S10 propose general constraints on the Capacity for Maneuver (CfM). There are two fundamental forms of adaptive capacity which allow for UABs to be viable: base- and extended adaptive capacity. Both are necessary, but inter-constrained (S7). UABs are local, have a certain position relative to the world and other units in the network: therefore, there is no best location in the network (S8). Furthermore, individual UABs each have their own perspective which is enriched by shifting and contrasting over multiple perspectives (S9). There are limits on models of adaptive capacity: therefore, mis-calibration is the norm and requires ongoing efforts from UABs to match actual capability (S10). Appendix C includes the detailed mapping of theoretical and empirical patterns based on the case results.

  • S7 Base and extended adaptive capacity: Findings show that a distinction between base- and extended capacity was not taken into consideration by all stakeholders or not synchronized across the network. For example, the Belgian railway operator mainly focused on reducing costs and eliminating non-profitable activities, even when performance was near saturation. Monitoring of redundant systems was often not implemented. This increased the risk of saturation as train drivers and operators were unaware of defects in primary systems. A more robust system anticipates the failures of components which may require adaptations from other components.

  • S8 No best location in the network: Findings show that certain UABs in the railway system caused many conflicts on a network-wide level due to local goals and interests. For example, the contractor's main objective was to produce and deliver rolling stock, not solving issues. However, railway operators were more concerned with acquiring the support from the contractor when issues arose. Cultural differences also complicated the relative positions of UABs in the V250 introduction, and their respective goals. Recommendations from the evaluation showed a strong preference for installing a central command to be in control, a so-called system integrator. The responsibilities of the inspectorate did not include ensuring that the entire railway system was able to provide reliable performance to railway passengers.

  • S9 Shifting perspectives: Findings show a lack of mutual understanding among UABs caused by different perspectives on several matters. A typical example from a technological point of view was the various interpretations of the ERTMS standards by contractors, which resulted in poor interfaces and communications between systems. Results show that it was impossible to implement ERTMS in the track without specification of the requirements of rolling stock systems to ensure compatibility and interoperability. Findings also show the involvement of a multicultural group of stakeholders consisting of public and private companies from at least four different countries and the need to identify the ‘DNA’ of involved stakeholders upfront for a better understanding. For example, the difficult collaboration between Dutch and Belgian operators and infrastructure managers with respect to solving technical failures in the test process was partly a result of different attitudes regarding anticipating, or reacting to failures when they occur. Findings also show the need to involve train drivers, train managers, cleaning staff, and mechanics early in the introduction process to develop knowledge and expertise to ensure reliability and usability when commercially operated. As the main conclusion of the parliamentary inquiry shows (Parliamentary Inquiry Committee Fyra 2015, p. 4), the perspective of railway passengers was overlooked, while other interests prevailed.

  • S10 Mis-calibration is the norm: Findings show patterns of over-optimism during all phases of the introduction. In hindsight, the call to start train services in December 2012, despite technical failures in trial operations and the winter season (risk of environmental influences), was too optimistic and reliability was at stake. Results show a pattern of strong pressures to start commercial operations, even if the train sets were not yet reliable. Insufficient awareness of train-track integration resulted in misalignment and a low rate of technical failures being resolved (e.g., ERTMS) in train and track, whereas a multidisciplinary approach to technical issues was required. Findings show reduced effort in exploring alternatives as fallback options in case of canceled train services due to constant rolling stock failures and increasing pressures. Workarounds were implemented to overcome system design failures. Data also show that these workarounds were not managed well and failures popped up periodically.

In summary, case results show patterns matching the theoretical patterns of S8–S10, except for the recognition of base and extended adaptive capacity (S7). The (formal) handovers from contractor to trial operations and from trial operations to commercial operations are also the appropriate moments to reflect on the balance between base- and adaptive capacity as an increase in unexpected failures is likely to occur. Overlooking the perspective of the railway passenger and their interests was a typical example of S9 and one of the key constraints for the CfM in this case.

1.5 Confirming patterns and alternative explanations

This section discusses the confirmed patterns and explores alternative explanations, supported by relevant concepts from literature. By comparing the data of the V250 case to the 24 sub-patterns of the 10 proto-theorems, matches and mismatches were identified (refer to Appendices). Figure 4 illustrates the groundedness of each proto-theorem based on the V250 dataset. The groundedness indicates the degree of correspondence of the proto-theorems with the dataset. For instance, Fig. 6 shows that the reflection of the statements S1 and S8 in the dataset was limited (2%). On the contrary, the reflection of S6 (UABs search for better operating points under pressure) in the dataset was high. Statistics show that 31% of the coded items were related to subset A (S1–S3), managing risk of saturation, 46% were related to subset B (S4–S6), network of adaptive units and 23% were related to the subset C (S7–S10), outmaneuvering constraints. Although no conclusions can be drawn from these statistics, they show a broad reflection of the theoretical patterns in the case and the distribution among the three subsets.

Fig. 6
figure 6

Degree of correspondence of GE patterns in V250 dataset (% of total number of 176 coded items)

In general, case patterns show high resemblance with the 10 proto-theorems of the theory of GE, resulting in 21 matching sub-patterns and three sub-patterns that were not fully observed in the case (details are included in the Appendices). Sub-patterns are marked with a second (number) or third (letter) suffix.

  1. a)

    The parameter Capacity for Maneuver (CfM), which specifies how much of the range the unit has used and what remains to handle upcoming demands (S1.2) was not recognized as such in the case, which is understandable as this (new) parameter currently lacks measurability.

  2. b)

    Risk becomes operationalized as some dynamic function of how CfM is being used and what remains compared to ongoing and possible future demands (S3. 3a). Case results show the need to optimize (traditional) control practices (e.g., timely identification of (shared) risks), but also the limitations of control and planning.

  3. c)

    The theory explicitly recognizes that there are two basic kinds of adaptive value, one far from saturation (base adaptive capacity) and another that operates near saturation (extended adaptive capacity) (S7. 1). Case results did not provide evidence for this distinction. However, saturation in complex rolling stock introductions differ from saturation in, e.g., commercial flights, where the scope for action in case of surprise events is limited. In the case study, the system became brittle, and eventually broke down, when the willingness to extend the CfM, for example by repairing the trains, was not broadly supported by the stakeholders involved in the network.

Case results (e.g., lack of integrated risk assessments, need for a system integrator, and more supervision) supported the need to optimize current control practices, but also outlined downsides of control, illustrated by, for example, many legal agreements and revisions of plans. This may imply the need for a more indeterministic perspective to avoid the ‘illusion of control’ (Langer 1975). The illusion of control refers to the notion that organizations are under the impression that they know more or less what is going to happen next. The focus on order and control is also reflected in most of organizational theory. Even before the work of Taylor (1914), management tended to assume that order is generally good, something to strive for, and that deviations from order, or disorder, are generally bad, and to be avoided (Shenhav 1995).

The Law of Requisite Variety (Ashby 1957) states that a controlling system can only control a system if it can generate the requisite variety to equal the variety generated by the system to control. This was restated by Beer (1985, p. 30): “only variety can absorb variety”. In other words, effective management control is only achieved when there is a balance between the control system, the controlled system, and the environmental system. However, management is caught between the desire to limit the variety of the organization (so as to control it) and the risk of limiting the variety of the organization to the extent that it cannot control its environment. Introna (1997) terms this the management control paradox. More control by the control system will limit the controlled system and thus may result in an inability to adapt to internal and external changes. One possible solution to the management control paradox is to locate control in the system, and hence, the system must control itself. In order for the organization to be structurally coupled with the environment, the concept of the manager as an ‘external controller’ must be eliminated (Introna 1997, p. 95). By shifting the perspective on planning from the observer—to the involved perspective—planning becomes crafting, as Mintzberg's terms it (1994), or tinkering, as put forward by Ciborra (1996). Planning shifts from trying to find the rationally best alternative to negotiating meanings, translating actions, building alliances, and fixing obligatory passage points.

Another solution is accepting the fact that no accurate predictions are possible for every state of the system. Although this is considered psychologically disturbing, as it shows a lack of control over future outcomes, it results in increased benefits as the illusion of control will be avoided (Makridakis and Taleb 2009, p. 842). The concept of antifragility (Taleb 2012) may offer new insights into preparing for an uncertain future by embracing disorder. Taleb refers to fragility as the way in which a system suffers from the variability of its environment beyond a certain pre-set threshold, while antifragility refers to when it benefits from this variability (Taleb and Douady 2013). Furthermore, Taleb argues that we have been ‘fragilizing’ our systems by denying those stressors and disorder, making them vulnerable to surprise events. Nevertheless, most contemporary organizations do not like volatility, randomness, uncertainty, disorder, errors, stressors, or chaos. Yet, as the case introduction shows, disruption and randomness are increasing, and new approaches are required, as also observed by Martinetti et al. (2018).

1.6 Guidelines for adopting graceful extensibility in complex systems requiring sustained adaptability

The theory of GE (Woods 2018) is still in its infancy. Nevertheless, as it is based on empirical findings from former research and supported by the V250 case, it might already be valuable for organizations managing complex cyber-physical systems and striving for sustained adaptability. As with all new theories, operationalizing this theory to be applied to daily work is not an easy task. This section proposes guidelines for adopting graceful extensibility. Guidelines were identified and validated by key members of the case organization, based on the results of the pattern-matching analysis. These should be considered a starting point for new complex systems seeking sustained adaptability:

  • Increase awareness of unexpected surprises in the network using historical (complex) projects and indicate the limitations of traditional risk management approaches;

  • Assess the need for (base and extended) adaptive capacity for each unit in the network, and the network as a whole, based on the complexity involved over the lifetime of the introduction;

  • Introduce the Capacity for Maneuver (CfM) as a parameter or key performance indicator for regulating the risks of saturation from an integrated network perspective. Compare integrated risk assessments to the assessment of the required adaptive capacity. This should lead to an initial understanding and shared agenda for action among UABs;

  • Periodically challenge the ability of units in the system to extend capacity to adapt when surprise events challenge its boundaries and mitigate risks if necessary.

2 Conclusions and future research directions

This paper contributes to the field by assessing the explanatory power of the theory of graceful extensibility (GE) in a historical case study and provides guidelines for the operationalization of the theory in practice. Case results indicate that the majority of the theoretical and empirical patterns match, which provides evidence that the proposition is largely recognized. The proposition was defined as: “Complex rolling stock introductions can benefit from graceful extensibility to sustain adaptability as demands change as a result of surprise events challenging the boundaries of the system”. However, the parameter Capacity for Maneuver (CfM) was not recognized, which is required to manage the risk of saturation both at the level of an adaptive unit and at the level of a network (Woods 2018). If organizations are ‘infected’ by the illusion of control, and assume high levels of predictability, surprise events are almost not considered and the organizations think that there is no need to explore CfM. Traditional control mechanisms are insufficient to deal with the increased complexity effectively. Therefore, the authors propose to adopt a more indeterministic approach, besides GE.

The usefulness of pattern matching in this study lies in supporting the authors’ assumptions that graceful extensibility can support future complex introduction of cyber-physical systems for sustained adaptability. However, as briefly touched on in Sect. 3, a research design based on pattern matching needs to consider several issues (Trochim 1989). Although most issues have been addressed by the researchers, two issues which require further explanation remain. The conceptualization of the theory was based on a single theory. As such, case results can only provide a (mis)match with the theory of GE: other theories on sustained adaptability were excluded. A second factor potentially limiting the accuracy of the procedures required to provide evidence for a match lies in possible confirmation bias as both open and focused coding were performed by one researcher. Coding was evaluated by the case organization, but not by a second independent researcher.

This study can be considered a first attempt to empirically evaluate the applicability of the new theory on graceful extensibility. The scope of this study was limited to a single in-depth case study. Further empirical research is required to (dis)confirm the proto-theorems of GE. The convergence between information and operational technologies is expected to further increase complexity of the railway system, resulting in more surprise events which need to be managed. This will require human–technical systems that are able to continuously adapt to new (technical) challenges and demands. Although the V250 case can be considered an unsuccessful case of sustained adaptability, it may serve future rolling stock introductions and other complex cyber-physical asset introductions in their push for sustained adaptability when dealing with surprise events.