1 Introduction

High-value capital assets, such as energy systems (for example, wind turbines), medical systems (for example, interventional X-ray machines), lithography machines in semiconductor fabrication plants, and baggage handling systems at airports require maintenance throughout their (long) lifetimes. Such capital assets are crucial to the primary processes of their users/operators and unexpected failures may have very significant negative impacts and even life threatening consequences. In order to avoid or to minimize failures, asset owners perform preventive maintenance activities, with the objective to retain or to restore a system back to a satisfactory operating condition. The costs of both these maintenance activities, and of their respective unscheduled downtimes, represent one of the key drivers of an organization’s total costs. Such maintenance costs constitute up to 70% of the total value of the end product [4, 22], and this percentage is rapidly increasing [44]. Hence, there is great incentive for asset owners to optimize the maintenance planning.

The most common maintenance practices are the so-called corrective maintenance and planned maintenance. The former, as the name suggests, proposes the repair of the asset upon failure, while the latter proposes a fixed service schedule for the field service engineers with the objective of ensuring that the asset operates correctly and of avoiding any unscheduled breakdown and downtime. The cost of planned maintenance is relatively low in comparison to that of corrective maintenance, due to its planned, anticipated nature. Planned maintenance is characterized by its scheduled downtimes (contrary to the unscheduled downtime experienced at a failure, which leads to a corrective maintenance) with fixed inter-scheduled instances, say at instances \(\tau ,2\tau ,3\tau ,\ldots \), (for example, \(\tau = 6\)  months). Such instances constitute the scheduled opportunities of preventive maintenance.

In the context of a network of assets, such as a wind park or a network of hospitals in close geographic proximity (from the viewpoint of the service provider), there is a second type (in addition to the above scheduled instances) of opportunity to perform preventive maintenance. In the event that a failure occurs, its corrective maintenance instance can be viewed as an unscheduled opportunity for preventive maintenance for the other assets in the network. In these instances, opportunistic maintenance can take place, with the respective instances constituting the unscheduled opportunities of preventive maintenance. This form of network dependency can be viewed on two levels: (i) the economic dependency between the various systems of a network, and (ii) the structural degradation and failure dependencies. Similarly to planned maintenance, opportunistic maintenance has a lower cost in comparison to that of corrective maintenance.

Incorporating opportunistic maintenance may also affect the scheduling of planned maintenance, as it might be beneficial to defer the planned maintenance opportunity to take place after a period of length \(\tau \) after the occurrence of an opportunistic maintenance. This decision of deferring or not the scheduling of planned maintenance after the occurrence of opportunistic maintenance may have a positive or negative effect on the total costs.

In maintenance, it is oftentimes assumed that a maintenance activity is perfect, i.e., it restores the system to a state of ‘as good as new.’ However, this assumption may not be true in practice. For instance, a misidentification of the root cause of the (imminent) failure can lead to an erroneous repair not resolving the actual issue, or some minor repair activity (such as exchange of parts, changes or adjustment of the settings, software update, lubrication or cleaning, etc; see [34]) not restoring the system to a state of ‘as good as new.’ In the above-mentioned cases, it is more reasonable to assume that the system is restored to a state between ‘as bad as old’ and ‘as good as new.’ This concept will be referred to as imperfect maintenance. Evidently, this assumption impacts the resulting cost. Hence, knowledge regarding the degree of how successful a maintenance activity is should not be ignored in the maintenance planning.

In conclusion, asset owners are oftentimes faced with the following questions:

  1. (i)

    What is the advantage of incorporating planned maintenance in comparison to exercising only corrective maintenance?

  2. (ii)

    What is the benefit of sharing resources in the network (in the form of incorporating opportunistic maintenance in addition to the planned maintenance)?

  3. (iii)

    What is the influence of deferring the planned maintenance after the occurrence of opportunistic maintenance?

  4. (iv)

    What is the influence of imperfect maintenance on the maintenance planning and on the costs (long-run rate of cost)?

  5. (v)

    When should preventive maintenance be performed (so as to minimize the long-run rate of cost)?

1.1 Main contributions

We consider a stylized, yet representative, model that incorporates the above-mentioned characteristics, and we prove the existence of the optimal maintenance policy and we derive its structure. Furthermore, we compute an explicit expression for the long-run rate of cost, which can be easily used by asset owners and service providers so as to gain further insights into their practice and so as to compute the cost-benefits of changing their maintenance practice. More concretely, the main contributions of the paper are threefold: (1) We consider a semi-Markov decision process that incorporates planned and opportunistic maintenance, as well as imperfect maintenance. From the analysis of the semi-Markov decision process stems the characterization of the optimal policy as a control limit policy (threshold) depending on the time until the next planned maintenance opportunity. Moreover, using this approach, we are able to derive a closed-form expression for this control limit. (2) Considering the class of control limit policies (depending on the remaining time until the next planned maintenance), we derive, using the theory of regenerative processes, an explicit expression for the long-run rate of cost. (3) We consider data from the wind energy industry and provide, based on these values, concrete answers to Questions (i)–(v) mentioned above. More specifically, we analyze the benefit of using planned and opportunistic maintenance compared to only corrective maintenance. We also analyze the influence of deferring planned maintenance after the occurrence of opportunistic maintenance. Finally, we also highlight the cost savings that can be attained by reducing the probability of an imperfect maintenance.

1.2 Outline of this paper

The remainder of this paper is structured as follows: In Sect. 2, we review the related literature. In Sect. 3, we describe in detail the model at hand, which captures the condition of the asset and which incorporates imperfect maintenance at scheduled and unscheduled maintenance opportunities. Subsequently, in Sect. 4, we characterize the structure of the optimal policy for condition-based maintenance using the average cost criterion, see Sect. 4.1, and we compute the long-run rate of cost for any policy with the same structure as the optimal policy (i.e., the class of control limit policies depending on the remaining time until the next planned maintenance), see Sect. 4.2. In Sect. 5, we permit the deferral of planned maintenance after the occurrence of opportunistic maintenance, and we compute the long-run rate of cost. A numerical illustration is provided in Sect. 6, where, based on data from the wind energy industry, we compare the long-run rate of cost for various policies, we show the effect of imperfect maintenance, and the effect of deferring planned maintenance. Finally, Sect. 7 contains concluding remarks and highlights directions for future research.

2 Literature review

Maintenance optimization models have been extensively studied in the literature. Optimal maintenance policies aim to provide optimal system reliability/availability and safety performance at lowest possible maintenance costs [27]. Due to the fast development of sensing techniques in recent years, the state of a capital asset can be monitored or inspected at a much lower cost and in a continuous fashion, which facilitates condition-based maintenance. Condition-based maintenance recommends maintenance actions based on information collected through online monitoring of the capital asset and it can significantly reduce maintenance costs by decreasing the number of unnecessary maintenance operations; see, for example, Jardine et al. [10], Peng et al. [26] and Lam and Banjevic [18]. The condition-based maintenance model that we propose builds on the delay time model proposed by Christer [6] and Christer and Waller [5]. We refer the reader to Baker and Christer [2], Christer [7] and Wang [38], and, more recently, Wang [39] for an overview on delay time models. Not only are delay time models well-known in the literature, but they also very frequently appear in practice.

Practice-based research with real diagnostic data, such as data related to the spectrometry of oil (for example, [16, 21]) and data related to vibrations (for example, [40]), showed that it is usually sufficient, and even preferable from a modeling and decision-making perspective, to consider only two operational states. The first state is the perfect state, in which the system lasts from newly installed to the point that a hidden defect has been identified. After the occurrence of a hidden defect in the system until the occurrence of a failure (which is typically referred to as the delay time), the system resides in the second state, also referred to as the satisfactory state. Such a classification of the operational states has the property that maintenance actions are initiated only when the system is degraded to the state that can actually lead to a direct failure, i.e., the satisfactory state, but not when the system is functioning perfectly, i.e., the perfect state. The vast majority of the literature on delay time models is restricted to numerical methods or approximations to solve the models at hand, due to their underlying complexity. A few recent exceptions are Maillart and Pollock [20], Kim and Makis [17] and Van Oosterom et al. [36], who study two-state systems under periodic inspection, partial observability, and postponed replacement, respectively, and provide analytical results regarding the structure of the optimal policy. However, none of them consider the option of resource sharing in the network (in the form of opportunistic maintenance), nor do they incorporate the notion of imperfect repair.

Most delay time model analyses assume that the system after a maintenance action is restored to a state of ‘as good as new.’ Contrary to this assumption, in imperfect maintenance it is assumed that, upon preventive maintenance, the system lies in a state somewhere between ‘as good as new’ and ‘as bad as old.’ This is first introduced by Nakagawa [23, 24] and is called the (pq)-rule. Under the (pq)-rule, the system is returned to an ‘as good as new’ state (perfect preventive maintenance) with probability p and it is returned to the ‘as bad as old’ state (minimal preventive maintenance) with probability \(q = 1 - p\) after preventive maintenance. Clearly, the case \(p = 0\) corresponds to having no preventive maintenance. Also, from a practical point of view, imperfect maintenance can describe a large set of realistic maintenance actions [27].

When planning condition-based maintenance strategies, see, for example, Jardine et al. [10], Jardine and Tsang [11] and Prajapati et al. [28], a typical assumption in the literature is that the system at hand is monitored continuously and one can intervene and maintain the system at any given moment. However, due to accessibility reasons (for example, in the case of off-shore wind parks) or for cost reduction purposes, it is cost optimal and more practical to allow only for discrete time opportunities. The simplest among the discrete time opportunities are the periodic planned maintenance instances (also referred to as scheduled downs), with period, say, \(\tau \), that serve as a scheduled opportunity to do maintenance for a network of systems. Furthermore, unplanned maintenance instances (due to opportunistic maintenance) can be modeled as discrete instances occurring according to a multi-dimensional counting process.

For recent works related to opportunistic maintenance, the interested reader is referred to Zhu et al. [42, 43], Arts and Basten [1] and Kalosi et al. [14]. In Zhu et al. [43] and Zhu et al. [42], the authors consider a single-unit system and account for both scheduled and unscheduled opportunities. In these analyses, the authors model the age and the condition, respectively, of the system and derive, based on approximations, the long-run rate of cost under a given policy. In both papers, the arrivals of unscheduled opportunities are modeled according to a homogeneous Poisson process. This approximation is justified by the Palm–Khintchine theorem [15], which states that even if the failure times of some systems do not follow exponential distributions, the superposition of a sufficiently large number of independent renewal processes behaves asymptotically like a Poisson process. Arts and Basten [1] build further on Zhu et al. [42, 43], but they only consider scheduled maintenance opportunities (excluding unscheduled opportunities). Furthermore, Arts and Basten [1] assume that at a scheduled opportunity, the system is restored to a perfect condition (i.e., \(p=1\)), while at a failure they assume that the system is restored to a state which is stochastically identical to the state just prior to the system’s failure. In a recent conference paper, Kalosi et al. [14] looked at a model with both planned and unplanned maintenance opportunities, at which the system is restored to a perfect condition, showing some preliminary results that a control limit policy (depending on the remaining time until the next planned maintenance) is optimal.

In contrast to Arts and Basten [1] and to Zhu et al. [42, 43], in which the long-run rate of cost is computed for a given policy, we first characterize the structure of the optimal policy explicitly and thereafter, for the optimal policy class, we compute the long-run rate of cost. Furthermore, we include both scheduled and unscheduled maintenance opportunities. In contrast to Kalosi et al. [14], we extend the model by incorporating the (pq)-rule, making it more generic and realistic. Moreover, we are the first to analyze the influence of deferring planned maintenance and we illustrate the financial effects of the maintenance policy in a realistic context using data stemming from the wind industry.

3 Model description

We consider a single-unit system (equivalently, a component or asset) that is monitored continuously and whose condition is fully observable. We assume that the condition of the system degrades over time and that it can be modeled according to a delay time model. That is, the states are classified as perfect, satisfactory and failed. We shall refer to the state of perfect condition as state 2, the state of satisfactory condition as state 1 and the failure state as state 0. Furthermore, we assume that as soon as a system failure occurs, the system is instantaneously replaced by an ‘as good as new’ system. So, in the mathematical formulation of the model, we may assume, due to the instantaneous replacement at failure, that the model evolves between only states 1 and 2. The system spends an exponential amount of time with rate \(\mu _i\) in state i, \(i\in \{1,2\}\). The above model formulation implies that initially the system starts in state 2 (perfect state), then after an exponential amount of time with rate \(\mu _2\), the system deteriorates and the condition of the system goes to state 1 (satisfactory state). The system spends an exponential amount of time with rate \(\mu _1\) in state 1, after which a failure occurs. At a failure, the system is instantaneously replaced by an ‘as good as new’ system and the condition is restored to 2 (perfect state). A schematic evolution of the condition of the component and the corresponding times of transitions is depicted in Fig. 1.

Fig. 1
figure 1

Schematic evolution of the condition of the component and the corresponding times of transitions

We assume that we have two types of opportunities at which we can perform preventive maintenance (PM) before failure: the scheduled and the unscheduled opportunities. The scheduled opportunities correspond to pre-arranged opportunities occurring according to a fixed schedule. These opportunities can be attributed to either service/maintenance agreements or to regulation imposition checks. We assume that the scheduled opportunities occur at epochs \(\tau ,2\tau ,3\tau ,\ldots \), with \(\tau >0\). This is also in accordance with what happens in practice as maintenance actions, once planned, are typically not rescheduled. The unscheduled opportunities correspond to random opportunities triggered by failures of other systems in close proximity. We assume that these unscheduled opportunities occur according to a Poisson process at rate \(\lambda \).

The unscheduled and scheduled opportunities, abbreviated by USO and SO, respectively, serve as opportunities to perform preventive maintenance. Such preventive maintenance is assumed to cost less than a corrective maintenance (CM) upon failure, which costs \(c_{\text {cm}}\). Moreover, incorporating a planning perspective, we may assume that the preventive maintenance cost at an SO, \(c_{\text {pm}}^{\text {so}}\), is less than or equal to the corresponding cost at a USO, say \(c_{\text {pm}}^{\text {uso}}\), that is \(0<c_{\text {pm}}^{\text {so}}\le c_{\text {pm}}^{\text {uso}}<c_{\text {cm}}\) (however, we also extend our analysis to the case \(c_{\text {pm}}^{\text {so}}> c_{\text {pm}}^{\text {uso}}\)). Following the (pq)-rule of Nakagawa [23, 24], we assume that after preventive maintenance a system is returned to the ‘as good as new’ state with probability \(p\in (0,1]\) and returned to the ‘as bad as old’ state (i.e., the amount of time left until the failure has not altered) with probability \(q=1-p\).

Our aim is to determine a policy for when to perform preventive maintenance on the system based on its condition and the opportunity type, i.e., scheduled or unscheduled. More explicitly, we will need to formally define the state space, which refers to the condition of the system, the action space and the decision epochs. The state space is governed by the process depicting the condition of the system, i.e., the Markov chain evolving between the states \(\{1,2\}\). The action space consists of only two actions: perform preventive maintenance or do nothing. Lastly, the decision epochs are the SO and USO epochs. In Fig. 2, we depict the SO epochs by (\(*\)) and the USO epochs by (o).

Fig. 2
figure 2

A sample path of the model

Table 1 summarizes the abbreviations that we will use throughout the remainder of this paper.

Table 1 Overview of abbreviations

4 Optimal policy

The goal of this section is twofold: We first characterize the structure of the optimal average cost condition-based maintenance policy. We then derive an explicit form for the long-run rate of cost per time unit for any given policy that has the same structure as the optimal policy.

4.1 Average cost criterion

This section is devoted to the derivation of the optimal policy for when to perform preventive maintenance for the system at hand using the average cost criterion. To this purpose, we set up our problem as a (controlled) semi-Markov decision process. Due to the stochastic nature of the problem, it does not suffice to know the type of the decision epoch (SO or USO), but it is also required to keep track of the remaining time till the next SO. That time may impact our decision, i.e., the optimal policy may depend on the residual time till the next SO. Thus, for the full description of the condition (state) of the system, we use a triplet descriptor

$$\begin{aligned} {\mathcal {S}}=\left\{ (i,j,t):\ i\in \{1,2\}, \ j\in \{\text {SC},\text {USO}\},\ t\in (0,\tau )\right\} \cup \left\{ (i,\text {SO},0):\ i\in \{1,2\}\right\} , \end{aligned}$$

where i indicates the condition of the system. If \(j=\text {SC}\), then this means that the condition of the system is about to change and there is no decision associated with this epoch, while if \(j=\text {SO}\) or \(j=\text {USO}\), this means that this is a decision moment at either a scheduled (SO) or unscheduled opportunity (USO), respectively. Finally, the third element indicates the remaining time until the SO. Note that if \(j=\text {SO}\) then \(t=0\). The introduction of the remaining time until the upcoming SO in the full description of the condition of the system renders the model inhomogeneous, and for this reason we use techniques that stem from semi-Markov decision processes. Note here that the inclusion of the remaining time until the upcoming SO in the state, although it complicates the analysis, permits us to prove that there is an optimal policy in the class of deterministic stationary policies, cf. Propositions 1 and 3. At each decision epoch (depending on the values of \((i,j,t)\in {\mathcal {S}}\)), we can choose to perform preventive maintenance or do nothing, or in case of a failure to do corrective maintenance (CM), that is \({\mathcal {A}} =\{\text {perform PM, do nothing, perform CM}\}\), where \({\mathcal {A}}\) represents the overall action space.

Proposition 1

For the model at hand, the deterministic stationary policy is optimal for the average cost criterion.

A formal version of the above proposition, cf. Proposition 3, and its proof can be found in Appendix A, together with a full formal definition of the model in the context of semi-Markov decision processes. In addition to the theoretical validation that the above proposition offers on the existence and nature of the optimal maintenance policy, in the following theorem we compute the optimal policy.

Theorem 1

Under the assumption that \(c_{\text {pm}}^{\text {so}}< c_{\text {pm}}^{\text {uso}}\) and given the imperfect preventive maintenance probability \(1-p\in (0,1]\), the optimal policy under the average cost criterion is: For state 2, do nothing. For state 1, perform preventive maintenance at scheduled opportunities if \( \mu _1 c_{\text {cm}} > (\mu _1+\mu _2)\frac{c_{\text {pm}}^{\text {so}}}{p}\), and do nothing otherwise, and perform preventive maintenance at unscheduled opportunities for which the residual time until the next scheduled opportunity is in \([{\hat{t}},\tau )\), if \(\mu _1 c_{\text {cm}} > \left( \frac{c_{\text {pm}}^{\text {uso}}}{p} - \frac{c_{\text {pm}}^{\text {uso}} - c_{\text {pm}}^{\text {so}}}{e^{(\mu _1 +\mu _2)\tau }-1} \right) (\mu _1+\mu _2)\), and do nothing otherwise, where \({\hat{t}} = \min \{\tau ,\max \{0,t^*\}\}\), with \(t^*\) satisfying

$$\begin{aligned} \frac{c_{\text {pm}}^{\text {uso}}}{p}&= \frac{\mu _1c_{\text {cm}} + \lambda c_{\text {pm}}^{\text {uso} }}{\mu _1+\mu _2 + \lambda p } \nonumber \\&\quad +\, \left( \frac{-c_{\text {pm}}^{\text {so}}+\frac{\mu _1c_{\text {cm}}}{\mu _1+\mu _2} + \left( \frac{c_{\text {pm}}^{\text {uso}}}{p} - \frac{\mu _1 c_{\text {cm}}}{\mu _1 + \mu _2} \right) e^{(\mu _1+\mu _2)t^*}}{1-p} -\frac{\mu _1c_{\text {cm}} + \lambda c_{\text {pm}}^{\text {uso} }}{\mu _1+\mu _2 + \lambda p } \right) \nonumber \\&\quad \times \, e^{(\mu _1+\mu _2 + \lambda p )(\tau -t^*)}. \end{aligned}$$
(1)

Proof

See Appendices B and C. \(\square \)

For USOs, Theorem 1 establishes a control limit policy depending on the remaining time until the next SO: If the residual time until the next SO is smaller than \({\hat{t}}\), then it is optimal to not take the opportunity to perform preventive maintenance in state 1. This is intuitive in the sense that the urgency for preventive maintenance in state 1 at a USO should decrease as the cheaper opportunity at an SO is approaching.

Note that in the special case when preventive maintenance costs at SOs and USOs are equal, the optimal policy reduces to a stationary control limit policy, which is shown in Proposition 2.

Proposition 2

Under the assumption that \(c_{\text {pm}}^{\text {so}}=c_{\text {pm}}^{\text {uso}}=c_{\text {pm}}>0\) and given the imperfect preventive maintenance probability \(1-p\in (0,1]\), the optimal policy under the average cost criterion is: For state 2, do nothing. For state 1, perform preventive maintenance at both SOs and USOs if \( \mu _1 c_{\text {cm}}> (\mu _1+\mu _2) \frac{c_{\text {pm}}}{p}\), and do nothing otherwise.

Proof

The proof of this proposition is identical in structure to the proof of Case (i) in the proof of Theorem 1, and for this reason it is omitted. \(\square \)

One could also argue that the cost for preventive maintenance at a USO is actually less than the cost at an SO since there is already a cost attached to the opportunity at hand (for example, service engineers are already at a wind park and they can, at a small extra cost, repair other systems in close proximity as well). In this case, the optimal control policy also reduces to a stationary control limit policy, which is described in Theorem 2.

Theorem 2

Under the assumption that \(c_{\text {pm}}^{\text {so}}>c_{\text {pm}}^{\text {uso}}\) and given the imperfect preventive maintenance probability \(1-p\in (0,1]\), the optimal policy under the average cost criterion is: For state 2, do nothing. For state 1, perform preventive maintenance at an unscheduled opportunity if \( \mu _1 c_{\text {cm}} > (\mu _1+\mu _2)\frac{ c_{\text {pm}}^{\text {uso}}}{p} \), and do nothing otherwise, and perform preventive maintenance at an SO if \(\mu _1 c_{\text {cm}} > (\mu _1+\mu _2 )\frac{c_{\text {pm}}^{\text {so}}}{p}+ \lambda ({c_{\text {pm}}^{\text {so}}}-c_{\text {pm}}^{\text {uso}})\), and do nothing otherwise.

Proof

See Appendix D. \(\square \)

4.2 Long-run rate of cost per time unit

In the previous section, we characterized the structure of the optimal policy using the average cost criterion. This policy can be viewed as a control limit policy, with the control limit depending on the time until the next SO. In this section, we consider such a policy and we compute the long-run rate of cost per time unit. More concretely, we consider a policy under which in state 2 we do not perform preventive maintenance (i.e., we do nothing), and in state 1 we always perform preventive maintenance at SOs and we perform preventive maintenance at USOs if the remaining time till the next SO is greater than \({\tilde{t}}\), for some given value \({\tilde{t}}\in (0,\tau )\). The results obtained in this section are directly applicable to the results of Sect. 4.1, by setting \({\tilde{t}}=t^*\), cf. Theorem 1.

For the computation of the long-run rate of cost per time unit, we employ the theory of regenerative-like processes, also called stationary-cycle processes, described in Section 2.19 of Serfozo [33]. For this purpose, we consider the inter-regeneration times created by the SOs \(\{\tau , 2\tau , 3\tau , \ldots \}\). For the cost computation, we assume that, at the SOs, the system is in state 1 or 2 according to a stationary probability \(p_1(0)\) and \(p_2(0)\), respectively. The long-run rate of cost per time unit is calculated as the expected total cost incurred between consecutive SOs divided by \(\tau \).

Let \(p_i(t)\) be the probability that the system is in state \(i \in \{1,2\}\) given that the time until the next SO is \(t\in [0,\tau )\). Then the long-run rate of cost per time unit for this control limit policy (depending on the remaining time until the next planned maintenance) for any given time threshold is given in the next theorem.

Theorem 3

Consider a given policy under which in state 2 we opt to do nothing, and in state 1 we repair at scheduled opportunities and at unscheduled opportunities for which the remaining time until the next scheduled opportunity is greater than \({\tilde{t}}\in (0,\tau )\), and we do nothing otherwise. Under this policy, the long-run rate of cost per time unit equals

$$\begin{aligned} \frac{c_{\text {pm}}^{\text {so}}p_1(0)+c_{\text {pm}}^{\text {uso}}\lambda \int _{{\tilde{t}}}^{\tau }p_1(t){{\,\mathrm{d \!}\,}}t + c_{\text {cm}} \mu _1 \int _{0}^{\tau } p_1(t) {{\,\mathrm{d \!}\,}}t}{\tau }, \end{aligned}$$
(2)

with

figure a

where the constants \(C_1\) and \(C_2\) are obtained as follows:

$$\begin{aligned} C_1&=C_2\, e^{\lambda p {\tilde{t}}}- \frac{\mu _2 }{ \mu _1+\mu _2} \frac{\lambda p}{ \mu _1+\mu _2 + \lambda p } e^{- (\mu _1+\mu _2) {\tilde{t}}},\\ C_2&=\frac{\frac{\mu _2}{\mu _1+\mu _2 } \left( 1-e^{-(\mu _1+\mu _2) {\tilde{t}}}\right) + \frac{\mu _2}{\mu _1+\mu _2 + \lambda p }\left( \frac{1}{1-p}-e^{-(\mu _1+\mu _2) {\tilde{t}}}\right) }{\frac{1}{1-p}e^{(\mu _1+\mu _2 + \lambda p )\tau }-e^{\lambda p {\tilde{t}}}}. \end{aligned}$$

Proof

The expected total cost incurred in one cycle consists of three parts (cf. Eq. (2)), which are related to the expected cost associated with preventive maintenance at SOs, with preventive maintenance at USOs and with corrective maintenance, respectively. It is now sufficient to derive \(p_i(t)\) for \(t\in [0,\tau )\), \(i \in \{1,2\}\).

For \(t\in [{\tilde{t}},\tau )\), the time-dependent behavior of \(p_1(t)\) is governed by

$$\begin{aligned} p_1(t)&= p_1(t+ {{\,\mathrm{d \!}\,}}t)(1-(\mu _1+\lambda p){{\,\mathrm{d \!}\,}}t) + p_2(t+{{\,\mathrm{d \!}\,}}t)\mu _2 {{\,\mathrm{d \!}\,}}t. \end{aligned}$$
(5)

Equation (5) is easily obtained by considering a small time interval of length \({{\,\mathrm{d \!}\,}}t\), and noticing that at time t we are in state 1 either due to a transition from state 2 with infinitesimal probability \(\mu _2 {{\,\mathrm{d \!}\,}}t\) or we have remained in state 1 with infinitesimal probability \(1-(\mu _1+\lambda p){{\,\mathrm{d \!}\,}}t\). Subtracting \(p_1(t+{{\,\mathrm{d \!}\,}}t)\) from both sides of Eq. (5), some straightforward computations yield

$$\begin{aligned} p_1(t+{{\,\mathrm{d \!}\,}}t)-p_1(t) = p_1(t+{{\,\mathrm{d \!}\,}}t)(\mu _1+\lambda p){{\,\mathrm{d \!}\,}}t - p_2(t+{{\,\mathrm{d \!}\,}}t)\mu _2 {{\,\mathrm{d \!}\,}}t. \end{aligned}$$

Dividing this expression by \({{\,\mathrm{d \!}\,}}t\) and letting \({{\,\mathrm{d \!}\,}}t\rightarrow 0\) results in

$$\begin{aligned} p_1^{'}(t) = p_1(t)(\mu _1+\lambda p) - p_2(t)\mu _2. \end{aligned}$$

Following a similar analysis for \(p_2(t)\) yields the following system of differential equations, for \(t\in [{\tilde{t}},\tau )\):

$$\begin{aligned}&\left[ \begin{array}{c} p_1'(t) \\ p_2'(t) \end{array} \right] = \begin{bmatrix} \mu _1 + \lambda p&-\mu _2 \\ -(\mu _1 + \lambda p)&\mu _2 \end{bmatrix} \times \left[ \begin{array}{c} p_1(t) \\ p_2(t) \end{array} \right] ,&t\in [{\tilde{t}},\tau ). \end{aligned}$$
(6)

Similarly, for \(t\in [0,{\tilde{t}})\) we have

$$\begin{aligned}&\left[ \begin{array}{c} p_1'(t) \\ p_2'(t) \end{array} \right] = \begin{bmatrix} \mu _1&-\mu _2 \\ -\mu _1&\mu _2 \end{bmatrix} \times \left[ \begin{array}{c} p_1(t) \\ p_2(t) \end{array} \right] ,&t\in [0,{\tilde{t}}). \end{aligned}$$
(7)

Solving the system of differential equations (6) and (7) leads to the desired solutions (3) and (4), respectively. In this process, we would need to compute four unknown constants. This is achieved by using: (i) the normalizing condition, i.e., \(p_1(t)+p_2(t)=1\) for all \(t\in [0,\tau )\), (ii) the continuity condition at \({\tilde{t}}\), i.e., \(\lim \limits _{t\rightarrow {\tilde{t}}^-} p_i(t)=p_i({\tilde{t}})\) for \(i\in \{1,2\}\), and (iii) the boundary condition at the SOs imposed by the policy and the imperfect maintenance probability, i.e., \((1-p)p_1(0) = \lim \limits _{t\rightarrow \tau ^-}p_1(t)\). \(\square \)

4.2.1 Special cases

In case of only scheduled opportunities, which corresponds to the case \({\tilde{t}}\rightarrow \tau \) or, equivalently, to the case \(\lambda \rightarrow 0\), the probabilities \(p_i(t)\) for \(i \in \{1,2\}\) are derived from the system of linear equations in (7) plus the normalizing condition, i.e., \(p_1(t)+p_2(t)=1\) for all \(t\in [0,\tau )\). This yields

$$\begin{aligned} p_1(t) = \frac{\mu _2}{\mu _1+\mu _2} \left( 1 - \frac{p e^{(\mu _1+\mu _2)t}}{e^{(\mu _1+\mu _2)\tau }-1+p}\right) ,\ t\in [0,\tau ). \end{aligned}$$

Plugging the above result into Eq. (2), after appropriately considering in Eq. (2) only the costs related to preventive maintenance at SOs and corrective maintenance,

$$\begin{aligned} \frac{c_{\text {pm}}^{\text {so}}p_1(0)+ c_{\text {cm}} \mu _1 \int _{0}^{\tau } p_1(t) {{\,\mathrm{d \!}\,}}t}{\tau }, \end{aligned}$$

leads to the long-run rate of cost per time unit in the case of only SOs.

In case of perfect maintenance, i.e., in the case \(p=1\), the boundary condition at the SOs imposed by the policy and the imperfect maintenance in the proof of Theorem 3 reduces to \(\lim \limits _{t\rightarrow \tau ^-}p_1(t)=0\), as immediately after an SO the system is restored to state 2 with probability 1. This enables us to explicitly solve the system of linear Eqs. (6) and (7), yielding

$$\begin{aligned} p_1(t)&=\frac{\mu _2}{\mu _1+\mu _2} \\&\quad +\, \left( \frac{\mu _2}{\lambda +\mu _1+\mu _2}-\frac{\mu _2}{\mu _1+\mu _2}-\frac{\mu _2}{\lambda +\mu _1+\mu _2}e^{\left( \lambda +\mu _1+\mu _2\right) \left( {\tilde{t}}-\tau +\frac{\Lambda (t)}{\lambda }(t-{\tilde{t}})\right) }\right) \\&\quad \times \, e^{\frac{\lambda -\Lambda (t)}{\lambda }(\mu _1+\mu _2)(t-{\tilde{t}})}, \end{aligned}$$

where

$$\begin{aligned} \Lambda (t)={\left\{ \begin{array}{ll} 0,&{}\quad \text { if } \ 0\le t<{\tilde{t}},\\ \lambda ,&{}\quad \text { if }\ {\tilde{t}}\le t<\tau . \end{array}\right. } \end{aligned}$$

Combining this expression with Eq. (2) results in the long-run rate of cost per time unit in the case of perfect maintenance.

In the case of only unscheduled opportunities, which is equivalent to considering \(\tau \rightarrow \infty \), the condition of the system can be fully described using a double descriptor \({\mathcal {S}}=\left\{ (i,j):\ i\in \{1,2\}, \ j\in \{\text {SC},\text {USO}\}\right\} \) which is independent of time, and thus the new model formulation falls into the framework of regular Markov decision processes. It can be easily shown that: For state 2, the optimal policy is to do nothing, and, for state 1, the optimal policy is to repair if \(\frac{(\mu _1+\mu _2)c_{\text {pm}}^{\text {uso}}}{p} < \mu _1 c_{\text {cm}}\) and to do nothing otherwise. Furthermore, under the optimal policy the average long-run rate of cost is equal to

$$\begin{aligned} \frac{c_{\text {pm}}^{\text {uso}}\lambda \mu _2+c_{\text {cm}}\mu _1\mu _2}{\lambda p +\mu _1+\mu _2}. \end{aligned}$$

In the case of only corrective replacements, the long-run rate of cost is equal to

$$\begin{aligned} c_{\text {cm}}\frac{\mu _1\mu _2}{\mu _2+\mu _1}. \end{aligned}$$

5 Deferring planned maintenance

In this section, we consider that upon a successful maintenance activity (preventive, at an SO or at a USO, or corrective), the upcoming planned maintenance is deferred for a period of length \(\tau \), i.e., at the instances of successful maintenance the remaining time till the next SO is set equal to \(\tau \). We are interested in computing the long-run rate of cost under deferred maintenance and, in Sect. 6.3, using the results of this section and of the previous sections in investigating the economic benefits of deferring planned maintenance.

Analogously to the analysis of Sect. 4.2, we derive the long-run rate of cost using renewal theory; see, for example, [31, Proposition 7.3, page 433]. In this case, we consider the renewal points to be the instances at which there was a successful maintenance activity, i.e., the SOs or USOs at which the preventive maintenance was perfect, or the epochs at which corrective maintenance is performed. Note that the underlying stochastic process that governs the condition of the system regenerates after each successful maintenance activity. That is, after each successful maintenance activity the underlying stochastic process is in state 2 with probability 1. The long-run rate of cost per time unit for a policy in the class of optimal policies is given in the next theorem. As the expressions appearing in the theorem do not simplify upon further computations, we choose to present them in the form of probabilities and expectations associated with the exponential distribution, as these expressions are straightforward (though cumbersome to compute) and shed insight on each of the individual events participating in the final expression, cf. Eq. (8).

Theorem 4

Consider a given policy under which in state 2 we do nothing, and in state 1 we repair at scheduled opportunities and at unscheduled opportunities for which the remaining time until the next scheduled opportunity is greater than \({\tilde{t}}\in (0,\tau )\), and we do nothing otherwise. Furthermore, consider that planned maintenance is deferred after a successful maintenance. Under this setting, the long-run rate of cost per time unit equals

$$\begin{aligned} \frac{{\mathbb {E}}\left[ \text {Total cycle cost}\right] }{{\mathbb {E}}\left[ \text {Total cycle length}\right] } = \frac{{\mathbb {E}}\left[ C\!C\right] }{ \frac{1}{\mu _2} + {\mathbb {E}}\left[ C\!L\right] } = \frac{{\mathbb {E}}\left[ C\!C\, \mathbb {1}_{\{C\!L\, \le Y\}}\right] + {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{C\!L\,> Y\}} \right] }{ \frac{1}{\mu _2} + {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{C\!L\, \le Y\}}\right] + {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{C\!L\, >Y\}}\right] }, \end{aligned}$$
(8)

with

$$\begin{aligned} {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{C\!L\, \le Y\}} \right]&= {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{\text {USO}{[\tau - Y,\tau -{\tilde{t}}]}\}}\right] + {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{\text {SO}{[\tau -Y,\tau ]}\}}\right] \nonumber \\&\quad +\, {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{\text {CM}{[\tau - Y,\tau ]}\}}\right] , \end{aligned}$$
(9)
$$\begin{aligned} {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{C\!L\, >Y\}}\right]&= (1-p){\mathbb {P}}\left[ \text {SO}{[\tau -Y,\tau ]} \right] \Bigg ( {\mathbb {E}}[Y] +\frac{\tau (1-p) {\mathbb {P}}\left[ \text {SO}{[0,\tau ]} \,\right] }{1-(1-p) {\mathbb {P}}\left[ \text {SO}{[0,\tau ]} \,\right] } \nonumber \\&\quad +\, {\mathbb {E}}\left[ C\!L'\, \mathbb {1}_{\{C\!L'\le Y\}} \,|\, Y=\tau \right] \Bigg ),\end{aligned}$$
(10)
$$\begin{aligned} {\mathbb {E}}\left[ C\!C\, \mathbb {1}_{\{C\!L\, \le Y\}}\right]&= {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{\text {USO}{[\tau -Y,\tau -{\tilde{t}}]}\}}\right] + {\mathbb {E}}\left[ C\!C\, \mathbb {1}_{\{ \text {SO}{[\tau -Y,\tau ]}\}}\right] \nonumber \\&\quad +\, {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{\text {CM}{[\tau -Y,\tau ]}\}}\right] , \end{aligned}$$
(11)
$$\begin{aligned} {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{C\!L\, > Y\}}\right]&= (1-p) {\mathbb {P}}\left[ \text {SO}{[\tau -Y,\tau ]} \right] \Bigg ({\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{\text {SO}{[\tau -Y,\tau ]}\}}\right] \nonumber \\&\quad +\, \frac{(\lambda (1-p)(\tau -{\tilde{t}})c_{\text {pm}}^{\text {uso}} +c_{\text {pm}}^{\text {so}} ) (1-p) {\mathbb {P}}\left[ \text {SO}{[0,\tau ]} \,\right] }{1-(1-p) {\mathbb {P}}\left[ \text {SO}{[0,\tau ]} \,\right] } \nonumber \\&\quad +\, {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{C\!L'\, \le Y \}}\,|\, Y=\tau \right] \Bigg ), \end{aligned}$$
(12)

where the density of the truncated exponential random variable Y is given by

$$\begin{aligned} f_Y(y)&=\mu _2 \frac{ e^{-\mu _2(\tau -y)}}{1-e^{-\mu _2 \tau }}, \ y\in [0,\tau ), \end{aligned}$$
(13)

and for \(0\le y\le \tau \),

$$\begin{aligned} \mathbb {1}_{\{ \text {SO}{[\tau -y,\tau ]}\}}&{\mathop {=}\limits ^{d}}\mathbb {1}_{\{y<\min \{T_{\lambda p},T_{\mu _1}\}\}}+\mathbb {1}_{\{T_{\lambda p}< y<\min \{T_{\mu _1},{\tilde{t}}\}\}}\mathbb {1}_{\{y< {\tilde{t}}\}}\nonumber \\&\quad +\, \mathbb {1}_{\{y-{\tilde{t}}\le T_{\lambda p}<y, y\le T_{\mu _1}\}} \mathbb {1}_{\{y\ge {\tilde{t}}\}}, \end{aligned}$$
(14)
$$\begin{aligned} \mathbb {1}_{\{\text {USO}{[\tau -y,\tau -{\tilde{t}}]}\}}&{\mathop {=}\limits ^{d}}\mathbb {1}_{\{T_{\lambda p}<\min \{T_{\mu _1},y-{\tilde{t}}\}\}}\mathbb {1}_{\{y\ge {\tilde{t}}\}}, \end{aligned}$$
(15)
$$\begin{aligned} \mathbb {1}_{\{\text {CM}{[\tau -y,\tau ]}\}}&{\mathop {=}\limits ^{d}} \mathbb {1}_{\{T_{\mu _1}<\min \{y,T_{\lambda p}\}\}}+\mathbb {1}_{\{T_{\lambda p}<T_{\mu _1}<y\}}\mathbb {1}_{\{y<{\tilde{t}}\}} \nonumber \\&\quad +\, \mathbb {1}_{\{T_{\lambda p}<T_{\mu _1}<y,\, T_{\lambda p}\ge y-{\tilde{t}}\}}\mathbb {1}_{\{y\ge {\tilde{t}}\}},\end{aligned}$$
(16)
$$\begin{aligned} {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{\text {USO}{[\tau - y,\tau -{\tilde{t}}]}\}}\right]&= {\mathbb {E}}\left[ T_{\lambda p} \mathbb {1}_{\{\text {USO}{[\tau - y,\tau -{\tilde{t}}]}\}} \right] , \end{aligned}$$
(17)
$$\begin{aligned} {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{\text {SO}{[\tau - y,\tau ]}\}}\right]&= y p {\mathbb {P}}\left[ \text {SO}[\tau -y,\tau \,]\right] ,\end{aligned}$$
(18)
$$\begin{aligned} {\mathbb {E}}\left[ C\!L\, \mathbb {1}_{\{\text {CM}{[\tau - y,\tau ]}\}}\right]&= {\mathbb {E}}[T_{\mu _1}\mathbb {1}_{\{\text {CM}[\tau -y,\tau ]\}}] , \end{aligned}$$
(19)
$$\begin{aligned} {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{\text {USO}{[\tau -y,\tau -{\tilde{t}}]}\}}\right]&= c_{\text {pm}}^{\text {uso}} \,{\mathbb {P}}\left[ \text {USO}{[\tau -y,\tau -{\tilde{t}}]}\,\right] + \lambda (1-p) c_{\text {pm}}^{\text {uso}} \nonumber \\&\quad \times \, {\mathbb {E}}\left[ T_{\lambda p} \mathbb {1}_{\{\text {USO}{[\tau -y,\tau -{\tilde{t}}]}\}} \right] , \end{aligned}$$
(20)
$$\begin{aligned} {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{\text {SO}{[\tau -y,\tau ]}\}}\right]&= \Big (c_{\text {pm}}^{\text {so}} + \lambda (1-p) c_{\text {pm}}^{\text {uso}} \max \left\{ y-{\tilde{t}},0\right\} \Big ){\mathbb {P}}\left[ \text {SO}{[\tau -y,\tau ]}\,\right] , \end{aligned}$$
(21)
$$\begin{aligned} {\mathbb {E}}\left[ C\!C\,\mathbb {1}_{\{\text {CM}{[\tau -y,\tau ]}\}}\right]&= c_{\text {cm}} {\mathbb {P}}\left[ \text {CM}{[\tau -y,\tau ]}\right] \nonumber \\&\quad +\, \lambda (1-p) c_{\text {pm}}^{\text {uso}} {\mathbb {E}}\left[ \min \left\{ T_{\mu _1},\max \left\{ y-{\tilde{t}},0\right\} \right\} \,\mathbb {1}_{\{\text {CM}{[\tau -y,\tau ]}\}} \right] , \end{aligned}$$
(22)

where \(\mathbb {1}_{\{x\}}\) is an indicator function taking value 1 if event x occurs, and it is zero otherwise, \(T_{\mu _1}\sim \text {Exp}(\mu _1)\), \(T_{\lambda p}\sim \text {Exp}(\lambda p)\), \({\mathbb {P}}\left[ \,\cdot \,\right] ={\mathbb {E}}[\mathbb {1}_{\{\cdot \}}]\) for all events in Eqs. (14)–(16), and \(C\!L{\mathop {=}\limits ^{d}}C\!L'\).

Proof

See Appendix E. \(\square \)

6 Numerical results

Using the results and the analyses of the previous sections, in this section we illustrate through a few well chosen examples the effect of the various parameters in the long-run rate of cost. In these examples, we investigate the financial advantage of the optimal policy, when compared to other (suboptimal) policies. Furthermore, we highlight the financial benefit of perfect maintenance by comparing the long-run rate of cost for the perfect maintenance model (\(p=1\)) to that of the imperfect maintenance model (\(p\in (0,1)\)). Here, we also show the influence of imperfect maintenance on the maintenance planning. In addition, we illustrate the change introduced by the action of deferring planned maintenance after the occurrence of a successful maintenance. To illustrate the financial effects in a realistic context and to connect our analysis with practice, we use values and data stemming from the wind industry.

6.1 Comparison of the optimal policy to suboptimal policies

In this section we compute, in the context of the wind industry example, the long-run rate of cost under the optimal policy and we examine how it is affected by varying one by one the parameters \(\tau \), \(\lambda \) and \(c_{\text {pm}}^{\text {uso}}\), while keeping all other parameters fixed. For the determination of the values used in the numerical computations of this section, we consider the gearbox of a wind turbine. Statistics from a recent field study by Ribrant and Bertling [29] on Swedish wind parks in the period 1997–2005 showed that the gearbox is the most critical unit of a wind turbine. The notion of criticality is determined by the fact that a failure of the gearbox leads to the highest downtime when compared to all other wind turbine components, but also by the fact that this component has the highest failure rate among all wind turbine components [29, 34, 35]. Due to its extended downtime after a failure (which is captured in the corresponding maintenance cost), the corrective cost of a gearbox is relatively high compared to preventive maintenance costs; see, for example, Nilsson and Bertling [25]. Based on the values reported in the aforementioned studies, we set \(c_{\text {cm}}=300{,}000\), \(c_{\text {pm}}^{\text {so}}=1000\), \(\mu _2=0.31\), \(\mu _1=0.31\) and \(p=0.6\). In this case, the long-run rate of cost (in euros per year) in the case of only corrective replacements is equal to 46,500. Furthermore, motivated by the wind industry practice, we choose three different values for \(\tau \), that is \(\tau \in \{0.25, 0.5, 1\}\) (years). Next, we consider three different values for \(c_{\text {pm}}^{\text {uso}}\), i.e., \(c_{\text {pm}}^{\text {uso}} \in \{ 2000, 3000, 4000 \}\). Finally, with regard to \(\lambda \), we consider four different values, i.e., \(\lambda \in \{0.5, 1, 2, 4\}\).

In Table 2, we depict the long-run rate of cost for the above-mentioned values under four different policies: The first policy corresponds to replacements only at USOs (\(\pi _{\text {uso}}\)). The second policy corresponds to replacements only at SOs (\(\pi _{\text {so}}\)). The third policy is the optimal policy (\(\pi _{\text {opt}}\)), which is derived in Theorem 1. Note that it is numerically easier to obtain the optimal \({\tilde{t}}\) by minimizing the long-run rate of cost in Theorem 3, instead of from the closed-form expression in Theorem 1, as the latter requires the derivation of a root solution. The fourth policy concerns the optimal policy, but for \(p=1\). This assumption is motivated from practice, as it is oftentimes difficult to exactly determine the value of p and it is typically assumed that after a maintenance the component is restored to a perfect state. This policy is denoted by \(\pi '_{\text {opt}}\).

Table 2 Long-run rate of cost varying \(\lambda \), \(\tau \) and \(c_{\text {pm}}^{\text {uso}}\), while keeping all other parameters fixed for four policies

In Table 2, we observe, across all instances, that incorporating planned maintenance can significantly reduce costs compared to only corrective maintenance, which can be reduced even further by adding opportunistic maintenance. Intuitively, due to the cost structure, only planned maintenance at SOs can considerably improve the long-term rate of cost when compared to performing only opportunistic maintenance at USOs. Finally, if we compare \(\pi _{\text {opt}}\) with \(\pi '_{\text {opt}}\) we do not, despite the low value for p, observe significant differences. From an operational management perspective, this clearly implies that, if decision makers do not have any knowledge about the value of p, and given a similar cost structure as in the gearbox case, assuming perfect maintenance will result in a long-run rate of cost that is close to optimal regardless of the true value of p. This will be valid as long as the preventive maintenance cost (at both opportunities) is very small in comparison to the corrective maintenance cost, as is the case of the gearbox costs. As a rule of thumb, one can easily compute the expected number of maintenances (planned or opportunistic) required for a successful preventive maintenance and based on this compute the long-run rate of preventive maintenance cost (approximately of the order \(\max \{c_{\text {pm}}^{\text {so}},c_{\text {pm}}^{\text {uso}}\}/p\)) and compare it with the corrective cost. If the corrective cost is significantly higher, then one may assume that there is no significant difference between \(\pi _{\text {opt}}\) and \(\pi '_{\text {opt}}\), and as a consequence there is no significant difference in the values of the optimal policies under the imperfect and perfect maintenance. In the next section, we investigate the savings that can be obtained by improving the performance of a repair when a decision maker has some knowledge regarding the value of p.

6.2 Influence of imperfect maintenance

Let \(\pi ^{(p)}_{\text {opt}}\) represent the optimal policy as a function of the successful preventive maintenance probability p and let \(C(\pi ^{(p)}_{\text {opt}})\) denote the long-run rate of cost when the policy is \(\pi ^{(p)}_{\text {opt}}\). To demonstrate the effect of p in the rate of cost, we compute the relative difference in the cost of not having a perfect preventive maintenance as a function of p. This relative difference is denoted by \(\delta (p)\) and is equal to

$$\begin{aligned} \delta (p) = \frac{C(\pi ^{p}_{\text {opt}})-C(\pi ^{1}_{\text {opt}})}{C(\pi ^{1}_{\text {opt}})} \cdot 100 \%. \end{aligned}$$

\(\delta (p)\) indicates how much extra cost is incurred due to imperfect maintenance, and thus shows the benefit of improving the probability of executing a perfect maintenance.

In this numerical example, similarly to before we choose \(\mu _2=0.31\), and \(\mu _1=0.31\). Furthermore, we set \(\lambda = 4\) and \(\tau = 1\). Figure 3 shows \(\delta (p)\) for \(p \in [0.5,1]\) under two different cost structures (denoted by \(\delta (p)^1\) and \(\delta (p)^2\), respectively). Figure 4 depicts the corresponding optimal values for \({\tilde{t}}\) for both cost structures, denoted by \(t^1\) and \(t^2\), respectively. We use the same cost structure as in the previous section, i.e., for \(\delta (p)^1\), we consider \(c_{\text {pm}}^{\text {so}} = 1000, c_{\text {pm}}^{\text {uso}}=2000\) and \(c_{\text {cm}} = 300{,}000\), whereas, for \(\delta (p)^2\), we consider \(c_{\text {pm}}^{\text {so}} = 26{,}500, c_{\text {pm}}^{\text {uso}}=28{,}800\) and \(c_{\text {cm}} = 75{,}500\). The choice for the preventive maintenance cost at SOs and USOs in the second cost structure is common in the lithography industry (see [42]). Based on Fig. 3, we can conclude that, under both cost structures, significant costs can be saved by improving the probability of executing a perfect preventive maintenance (for example, by training).

Fig. 3
figure 3

\(\delta (p)^1\) and \(\delta (p)^2\) for \( p \in [0.5,1]\) with \(c_{\text {pm}}^{\text {so}} = 1000, c_{\text {pm}}^{\text {uso}}=2000\) and \(c_{\text {cm}} = 300{,}000\) for \(\delta (p)^1\), and \(c_{\text {pm}}^{\text {so}} = 26{,}500, c_{\text {pm}}^{\text {uso}}=28,800\) and \(c_{\text {cm}} = 75{,}500\) for \(\delta (p)^2\)

Fig. 4
figure 4

\(t^{1}\) and \({t}^{2}\) for \( p \in [0.5,1]\) with \(c_{\text {pm}}^{\text {so}} = 1000, c_{\text {pm}}^{\text {uso}}=2000\) and \(c_{\text {cm}} = 300{,}000\) for \(t^1\), and \(c_{\text {pm}}^{\text {so}} = 26{,}500, c_{\text {pm}}^{\text {uso}}=28{,}800\) and \(c_{\text {cm}} = 75{,}500\) for \(t^2\)

The optimal policy \(({\tilde{t}})\), denoted by \(t^{1}\) and \(t^{2}\) under the first and second cost structure respectively, is equal to \(t^{1}\approx 0.08\) and \(t^{2}\approx 0.39\) in the case of perfect repairs. In Fig. 4, where we plot \(t^{1}\) and \(t^{2}\) as a function of p, we observe the following regarding the influence of p on the maintenance planning: If the preventive maintenance cost (at both opportunities) is very small compared to the cost of corrective maintenance, the order of the total preventive maintenance cost incurred until a successful preventive maintenance compared to the corrective maintenance cost is still maintained. Therefore, the maintenance planning does not alter that much regardless of the value of p, where the optimal policy is to almost always perform preventive maintenance at USOs for all values of \(p\in [0.5,1]\). This also explains the small discrepancy between \(\pi _{\text {opt}}\) and \(\pi '_{\text {opt}}\) in Table 2. This is different in the case of the second cost structure, where the maintenance planning changes substantially as a function of p. Whereas in the perfect case, the optimal policy is to perform preventive maintenance at a USO if the residual time until the next SO is larger than 0.39, for \(p \lessapprox 0.83\), it is optimal to never perform preventive maintenance at a USO. Here, the order of the total preventive maintenance cost incurred until a successful preventive maintenance compared to the corrective maintenance cost is not maintained.

Also, in the opposite cost structure, i.e., \(c_{\text {pm}}^{\text {uso}}<c_{\text {pm}}^{\text {so}}\) (similar examples can be found for \(c_{\text {pm}}^{\text {uso}}=c_{\text {pm}}^{\text {so}}\)), the maintenance planning can be influenced significantly by the imperfect repair probability. For instance, consider the setting with \(\mu _1=1.1, \mu _2 =0.9\), \(c_{\text {pm}}^{\text {so}} = 4500\), \(c_{\text {pm}}^{\text {uso}}=4000\), \(c_{\text {cm}}=10{,}000\), and \(\lambda =0.5\). In case of perfect repairs (i.e., \(p=1\)), the optimal policy is to perform preventive maintenance in state 1 at both SOs and USOs, and to do nothing otherwise (cf. Theorem 2). However, if \(0.72 \lessapprox p \lessapprox 0.83\), the optimal policy is to only perform preventive maintenance at USOs, and if \(p \lessapprox 0.72\), then the optimal policy is to never perform PM. This example illustrates the influence of the imperfect repair probability on the maintenance planning.

6.3 Deferring of planned maintenance

In this section, we illustrate the change introduced by the action of deferring planned maintenance after the occurrence of a successful maintenance in three numerical examples that relate to the wind industry, the lithography industry, and to an artificially created example.

Figure 5 shows the long-run rate of cost for both the deferral and no deferral case for the example with data stemming from the wind industry. Again, with regard to the cost parameters, we used \(c_{\text {pm}}^{\text {so}} = 1000, c_{\text {pm}}^{\text {uso}}=2000\) and \(c_{\text {cm}} =\)  300,000. With regard to the other parameters, we set \(\lambda = 4\), \(\tau = 1\), \(\mu _1 = 0.31\), \(\mu _2 = 0.31\) and \(p=0.6\). We can observe that deferring the planned maintenance both significantly increases the long-run rate of cost under the optimal policy (an increase of 28.14% from 8468.87 to 10852.15) and changes the value connected to the optimal policy, \({\tilde{t}}\) from 0.112 to 0.

Fig. 5
figure 5

Cost rate in the case of deferral and of no deferral for the wind industry example. Optimal \({\tilde{t}}\) is equal to 0.112 and 0 for deferral and no deferral, respectively

Figure 6a, b depicts the long-run rate of cost for the deferral and the no deferral case, respectively, based on the values of the lithography industry example. We use the same cost parameters as in Sect. 6.2, that is \(c_{\text {pm}}^{\text {so}} = 26{,}500, c_{\text {pm}}^{\text {uso}}=\)  28,800 and \(c_{\text {cm}} = 75{,}500\). The other parameters remain unchanged, i.e., \(\lambda = 4\), \(\tau = 1\), \(\mu _1 = 0.31\), \(\mu _2 = 0.31\) and \(p=0.6\). Again, we observe the same influence of deferring the planned maintenance on both the long-run rate of cost under the optimal policy (an increase of 6533.3 % from 12,840.12 to 851,727.53) and on the value of \({\tilde{t}}\) associated with the optimal policy (from 1 to 0.175), similarly to the numerical example for the wind industry. The drastic increase is due to the cost structure, and more explicitly, it is due to the preventive maintenance costs values (both at scheduled and unscheduled opportunities), which are relatively much closer to the corrective maintenance cost in comparison to the wind industry example.

Fig. 6
figure 6

Cost rate for lithography industry example

To illustrate that the opposite effect (albeit to a much lesser degree than in the previous two examples) can also hold, we create an artificial example where we set \(c_{\text {pm}}^{\text {so}} = 5000, c_{\text {pm}}^{\text {uso}}=10{,}000\) and \(c_{\text {cm}} = 19{,}000\), and \(\lambda = 4\),\(\tau = 4\), \(\mu _1 = 1\), \(\mu _2 = 0.4\) and \(p=0.5\). Figure 7 depicts the long-run rate of cost for both the deferral and the no deferral case for this example. Here, we observe that, for all values of \({\tilde{t}}\), cost savings can be obtained by deferring planned maintenance after the occurrence of a successful opportunistic maintenance. More specifically, whereas the optimal value of \({\tilde{t}}\) is equal to 1 for both cases, the long-run rate of cost under the optimal policy decreases with 0.88% from 6458.97 to 6402.44, when deferring planned maintenance.

Fig. 7
figure 7

Cost rate in the case of deferral and no deferral for the artificial example. Optimal \({\tilde{t}}\) is equal to 1 for both deferral and no deferral

7 Conclusion

In this paper, we considered the maintenance policy for a three-state component degrading over time with corrective replacements at failures and preventive replacements at both scheduled and unscheduled opportunities under imperfect repair. By formulating this problem as a semi-Markov decision process, we were able to characterize the structure of the optimal maintenance policy as a control limit policy, where the control limit depends on the time until the next planned maintenance opportunity. Using this approach, a closed-form expression for the optimal control limit was derived. Within this class of control limit policies, we derived, using the theory of regenerative processes, an explicit expression for the long-run rate of cost. Using a similar approach based on renewal theory, we derived an expression for the long-run rate of cost in the case when planned maintenance is deferred after the occurrence of a successful opportunistic maintenance.

A cost comparison with other suboptimal policies has been examined, which illustrated the benefits of optimizing the maintenance policy. Specifically, it was found that incorporating planned maintenance can significantly reduce costs compared to only corrective maintenance, which can be reduced even further by adding opportunistic maintenance. Moreover, numerical results indicate that the extent of the impact of the perfect repair probability on the optimal policy depends on the underlying cost structure. It was also shown that substantial cost savings can be obtained by improving the perfect repair probability. Finally, our numerical examples indicate that the deferral of planned maintenance after the occurrence of a successful opportunistic maintenance may impact the total cost in both a negative and positive way.

There are a number of extensions and topics for future research. The most important direction is to consider the network dependency on the level of the structural degradation and failure dependencies, i.e., to consider a multi-dimensional process that captures the degradation of the various assets in the network. Such a future direction would be particularly interesting in the case of a small number of assets for which the Poisson approximation for the opportunistic maintenance may not be accurate. In addition, another very interesting research direction would be to consider a more general model in which the condition of the system degrades through \(N>2\) states. Next, in this analysis we have assumed that the condition of the system is fully observable. However, in many real applications, condition monitoring data such as spectrometric oil data or vibration data give only partial information about the underlying state of the system. From this perspective, it would be interesting to extend the model at hand to a partially observable model in which the condition monitoring data are stochastically related to the true system state. Finally, the results in this paper are valid for systems with hypo-exponentially distributed lifetimes. Future research could relax this assumption by considering a phase-type lifetime distribution.