Introduction

As modular reconfigurable robots (MRRs) take advantages of structural flexibility, low cost, excellent adaptability, etc., they often work in perilous and complex working environments, such as disaster rescue, deep space/sea exploration, smart manufacturing, and many other hazardous environments that human cannot involve directly [1,2,3].

In recent years, several researches on MRRs control approaches have been attracted a great deal of attention, such as centralized control [4, 5], distributed control [6, 7], and decentralized control [8, 9]. These control approaches mainly tackled force/position control problems [10, 11], fault tolerant control problems [12, 13], and so on. Despite above methods have achieved good performance, the designed controller always contain some adjustable parameters, which increase the design difficulty and structural complexity. Thus, more attention should be paid to simplify the control structure and reduce the computational burden.

As we know, the optimal control is one of the key requirements in modern control theory. It can not only ensure the stability of the control system, but also obtain proper optimal performance. Since the strong self-learning and optimization ability, adaptive dynamic programming (ADP) [14] was introduced and extensively investigated in optimal control by solving the Hamilton–Jacobi–Bellman equation (HJBE) without the “curse of dimensionality”. Therefore, more and more ADP-based control methods were investigated to deal with trajectory tracking [15, 16], zero-sum games [17], uncertainties [18], and actuator saturation [19], etc. It is significant to point out that reinforcement learning (RL) and ADP are almost in the same spirit when dealing with optimal control problems. Therefore, RL is often regarded as the synonym for ADP. Up to now, a lot of ADP and RL methods have been investigated [20,21,22,23]. Bai et al. designed an optimal control approach via the neural network (NN) technique and RL algorithm to tackle the nonstrict-feedback control problem [24] and input saturation [25]. For systems with known dynamics, Shi et al. [26] proposed an optimal tracking control (OTC) approach to handle time delay problems via integral RL and value iteration method. In [27], a novel approximate OTC strategy was addressed using an event-driven ADP algorithm to handle such problems. However, the aforementioned methods require accurate system dynamics, which is difficult to obtain in real industrial applications of MRRs. Recently, some model-free RL-based control methods have been presented. These control approaches depend merely on the input and the output measurement data of the controlled plant [28]. Actually, the model-free methods require large online or offline data to train NN, which wastes computation and training time.

Furthermore, MRRs work in hazardous environments for a long time may lead to the occurrence of failures, which can not only degrade the system performance, but even damage the surrounding workspace. As we all know, the actuator failure is always regarded as one of challenging failures to handle, because the occurrence of unknown actuator failures may easily cause serious deterioration of the system control performance compared with other fault scenarios. Furthermore, it is unrealizable to repair MRRs in hazardous environments. Hence, exploring a fault tolerant control (FTC) method is imperative to guarantee MRRs to continue working reliable in the presence of actuator failures.

FTC strategies mainly include passive FTC (PFTC) and active FTC (AFT-C). PFTC does not need fault detection and identification (FDI) unit. Over the last few decades, many PFTC approaches have been presented mainly based on quantitative feedback theory [29], linear matrix inequality [30], and \({H}_{\infty }\) theory [31]. PFTC designs a fixed controller before fault occurs, thus it can only solve the known faults [32]. Alternatively, AFTC can effectively avoid the drawback of PFTC. AFTC obtains fault information via FDI, then readjusts or reconstructs the control law. Various AFTC approaches can be categorized into fault accommodation [33], fault reconfiguration [34], and fault compensation [35]. Owing to the better performance of AFTC, it is potential in robot manipulators [36], quadrotors [37], inverted pendulums [38] and other practical applications. Moreover, some FTC schemes have been developed through RL or ADP. Zhao et al. [32] employed the information of the fault observer to construct an improved cost function and utilized online iteration algorithm to develop a novel FTC method for nonlinear systems. Fan and Yang [39] investigated an FTC strategy to handle the time-varying actuator bias faults via ADP. In [40], an ADP-based stabilizing scheme for nonlinear systems with unknown actuator saturation was developed via NN compensation. However, these literatures have solved stabilizing problems, rather than trajectory tracking, which is feasible to MRRs.

To get rapid response and convergence, sliding mode-based control schemes have been presented. Owing to the low sensitivity and robustness to system uncertainties and external disturbances, sliding mode control (SMC) reduces the necessity of accurate model, and is feasible to apply to design control systems no matter in normal or faulty conditions [41,42,43]. Hence, SMC methods always have been applied to systems with high nonlinearities, variable parameters and external disturbances, such as aircraft systems [44], direct current (DC) servomotors [45], multi-machine power systems [46] and MRRs [12].

Although previous ADP-based FTC methods can guarantee the stability of faulty system, we further require a faster control action in practice. Thus, motivated by [47], this paper develops a SM-based online fault compensation control (SMOFCC) scheme for MRRs with unknown actuator failures. For the fault-free case, the SM-based approximate optimal control (SMAOC) is derived using the SM-based iterative controller and an adaptive robust term. When the actuator failures occurs, the SMOFCC is obtained by adding an online fault compensator to SMAOC. The main contribution and novelties of this work are presented as follows.

  1. (1)

    The scheme extends the ADP-based SMC method to FTC problem for MRRs with unknown actuator failures, and the online fault compensation is achieved without FDI.

  2. (2)

    The proposed SMOFCC scheme, which is composed of SM-based iterative controller, an adaptive robust term and a fault compensator, can guarantee the MRR system to be asymptotically stable, rather than ultimately uniformly bounded (UUB) [3, 32, 39, 40].

  3. (3)

    By employing the SMC technique, the developed SMOFCC has a faster control response compared to that based on tracking error feedback only [3].

The rest of this paper is organised as follows. In the next section, we present the problem statement. In the subsequent section, the SM-based control scheme for MRRs in fault-free case is presented. Then, an online fault compensator is developed to obtain the FTC, and the stability analysis is provided. The numerical simulation demonstrates the effectiveness of the SMOFCC before the final sections. In the last section, a brief conclusion is drawn.

Problem statement

The n-DOF (degree of freedom) MRR system with unknown actuator failures can be described by

$$\begin{aligned} M(q){\ddot{q}} + C(q,{\dot{q}}){\dot{q}} + G(q) = u - f_a, \end{aligned}$$
(1)

where \(q \in {{\mathbb {R}}^n}\) denotes the vector of joint displacements, \(M(q) \in {{\mathbb {R}}^{n \times n}}\) denotes the nonsingular symmetric inertia matrix, \(C(q,{\dot{q}}){\dot{q}} \in {{\mathbb {R}}^n}\) denotes the Coriolis and centripetal force, \(G(q) \in {{\mathbb {R}}^n}\) denotes the gravity term, and \(u \in {{\mathbb {R}}^n}\) denotes the joint input torque, \(f_{a} \in {{\mathbb {R}}^n}\) represents the unknown additive actuator failure.

Define the system state as \(x = {[{x_1},{x_2}]^{\mathsf {T}}} = {[q,{\dot{q}}]^{\mathsf {T}}}\), the MRR system (1) can be presented as

$$\begin{aligned} \left\{ \begin{array}{l} {{{\dot{x}}}_1} = {x_2}\\ {{{\dot{x}}}_2} = f(x) + g(x)(u -f_a) \\ y = {x_1}, \end{array} \right. \end{aligned}$$
(2)

where \(x \in {{\mathbb {R}}^{2n}}\) and \(y \in {{\mathbb {R}}^{n}}\) are the state and the output vectors, respectively, \(f(x)={{M}^{-1}}(q)[-C(q,{\dot{q}})-G(q)]\) and \(g(x)={{M}^{-1}}(q)\).

Assumption 1

The nonlinear functions \(f(\cdot )\) and \(g(\cdot )\) are Lipschitz continuous with \(f(0) = 0\), i.e., \(x=0\) is the equilibrium point of system (2), and system (2) is controllable.

Assumption 2

The desired reference trajectory \({q_d}\), the velocity vector \(\dot{q_d}\), and the acceleration vector \({\ddot{q_d}}\) are norm-bounded as [15]

$$\begin{aligned} \left\| {\begin{array}{*{20}{c}} {{q_d}}\\ {{{{\dot{q}}}_d}}\\ {{{{\ddot{q}}}_d}} \end{array}} \right\| \le {q_{\kappa }}, \end{aligned}$$

where \(q_{\kappa } >0 \) is a known constant.

Assumption 3

The actuator failure \(f_a\) is norm-bounded as \(\left\| f_a \right\| \le \varsigma _{M}\), where \(\varsigma _{M}\) is a positive constant.

For the fault-free case of the MRR system (2), i.e., \(f_a = 0\), we define the nominal system as

$$\begin{aligned} \left\{ \begin{array}{l} {{{\dot{x}}}_1} = {x_2}\\ {{{\dot{x}}}_2} = f(x) + g(x){u_0} \\ y = {x_1}, \end{array} \right. \end{aligned}$$
(3)

where \({u_0}\) is a SMAOC law.

The tracking error is defined as

$$\begin{aligned} {e} = {x} - {x_\vartheta }, \end{aligned}$$
(4)

where \({x_\vartheta } = {[{q_\vartheta },{{\dot{q}}_\vartheta }]^{{\mathsf {T}}}}\) is the desired reference trajectory. Thus, the time derivative of the tracking error (4) becomes

$$\begin{aligned} {{\dot{e}}} = {{\dot{x}}} - {{\dot{x}}_\vartheta }. \end{aligned}$$
(5)

To accelerate the convergence rate, we introduce the SM surface as

$$\begin{aligned} s = {{\dot{e}}} + \varLambda e, \end{aligned}$$
(6)

where \(\varLambda \) is a positive definite matrix.

The time derivative of (6) is

$$\begin{aligned} {\dot{s}}&=\ddot{e}+\varLambda {\dot{e}} \nonumber \\&=f(x)+g(x){u_0}-{{\ddot{x}}_{\vartheta }}+\varLambda {\dot{e}}\nonumber \\&=f(x)+g(x){u_0}+\varphi , \end{aligned}$$
(7)

where \(\varphi =-{{\ddot{x}}_{\vartheta }}+\varLambda {\dot{e}}\).

To realize the approximate optimal control, the SM-based iterative controller \({u_s}\) is used to make the trajectory tracking error converge to the steady state, then the cost function is defined as

$$\begin{aligned} J( s(t) )=\int _{0}^{\infty }{Z( s(t),{{u}_{s}}(t)){\mathrm {d}}t}, \end{aligned}$$
(8)

where \(Z(s,{u_s})={{s}^{{\mathsf {T}}}}Qs+u_{s}^{{\mathsf {T}}}R{{u}_{s}}\), \(J( s(t)) \ge 0\) for arbitrary s and \(u_s\), and \(J(0) = 0\). \(Q\in {{{\mathbb {R}}}^{n\times n}}\), \(R\in {{{\mathbb {R}}}^{m\times m}}\) are positive definite matrices.

Remark 1

From (6), we can observe that the SM surface consists of the position tracking error e and the velocity tracking error \({\dot{e}}\), rather than position tracking error e only. Thus, compare the optimal control with the SM signal to that with position tracking error feedback only, the optimal control with SM has a faster convergence and smaller overshoot. Furthermore, the SMC has low sensitivity and strong robustness to system uncertainties, and is easily implemented in practice.

Online fault compensation control design and stability analysis

Sliding mode-based HJBE

Definition 1

Considering the nominal MRR system (3), a SM-based iterative control strategy \(\mu (s)\in \varPsi (\varOmega )\) is defined to be admissible subject to (8) on a compact set \(\varOmega \), if \(\mu (s)\) is continuous on \(\varOmega \) with \(\mu (0)=0\), \(\mu (s)\) ensures system (2) to be convergence on \(\varOmega \), and J(s) is finite, \( \forall s\in \varOmega \) [3, 27, 32].

For each admissible control strategy \(\mu (s)\in \varPsi (\varOmega )\) of system (3), where \(\varPsi (\varOmega )\) is the set of admissible control, if the cost function (8) is continuously differentiable, then the nonlinear Lyapunov equation can be derived as

$$\begin{aligned} 0=Z(s,{u_s})+{{(\nabla J(s))}^{\mathsf {T}}}{\dot{s}}, \end{aligned}$$
(9)

where \(\nabla J(s)=\frac{\partial J(s)}{\partial s}\).

The Hamiltonian is defined as

$$\begin{aligned} H( s,{{u}_{s}},\nabla J(s))=Z(s,{{u}_{s}})+{{( \nabla J(s))}^{{\mathsf {T}}}}{\dot{s}}, \end{aligned}$$
(10)

and the optimal cost function can be defined as

$$\begin{aligned} {{J}^{*}}( s(t))=\underset{{{u}_{s}}}{\mathop {\min }}\,\int _{0}^{\infty }{Z( s(t),{{u}_{s}}(t)){\mathrm {d}}t}. \end{aligned}$$
(11)

Based on the Bellman principle of optimization, \({{J}^{*}}(s)\) satisfies the HJBE

$$\begin{aligned} 0=\underset{{u_s}}{\mathop {\min }}\, H(s,{u_s},\nabla {{J}^{*}}(s)). \end{aligned}$$
(12)

Since \(\frac{\partial H( s,u_{s}^{*},\nabla {{J}^{*}}(s) )}{\partial u_{s}^{*}}=0\), the optimal control law can be obtained as

$$\begin{aligned} u_{s}^{*}(s)=-\frac{1}{2}{{R}^{-1}}{{g}^{{\mathsf {T}}}}(x)\nabla {{J}^{*}}(s). \end{aligned}$$
(13)

By equivalent transformation, (13) becomes

$$\begin{aligned} { \nabla {{J}^{*{\mathsf {T}}}}(s)}{g}(x)=-2{{ u_{s}^{*{\mathsf {T}}}(s)}}R. \end{aligned}$$
(14)

Online PI algorithm

According to [36, 37, 39, 40], the solution of HJBE (12) can be approximated through the online PI algorithm when the system is in normal state. Unlike [3], the online PI algorithm is realized with the help of SM feedback signal, rather than system tracking error. The online PI algorithm is presented in Algorithm 1.

figure a

Sliding mode-based critic neural network

The cost function J(s) can be reconstructed by a single-layer NN as

$$\begin{aligned} J(s)=W_{c}^{{\mathsf {T}}}{{\sigma }_{c}}(s)+{{\varepsilon }_{c}}(s), \end{aligned}$$
(15)

where \({{W}_{c}}\in {{{\mathbb {R}}}^{M}}\) and \({{\sigma }_{c}}(s)\) denote the ideal weight vector and the activation function, respectively, M denotes the number of neurons in the hidden layer, and \({{\varepsilon }_{c}}(s)\) denotes the approximation error caused by critic NN (CNN) approximation. Then, from (15), we can obtain

$$\begin{aligned} \nabla J(s)= \nabla \sigma _{c}^{{\mathsf {T}}}(s){{W}_{c}}+\nabla {{\varepsilon }_{c}^{{\mathsf {T}}}}(s). \end{aligned}$$
(16)

According (16), the Hamiltonian (10) can be rewritten as

$$\begin{aligned} H( s,{{u}_{s}},{{W}_{c}})&=Z(s,{{u}_{s}})+( W_{c}^{{\mathsf {T}}}\nabla {{\sigma }_{c}}(s)){\dot{s}}\nonumber \\&=-\nabla \varepsilon _{c}^{{\mathsf {T}} }{\dot{s}} \nonumber \\&={{s}_{cH}}, \end{aligned}$$
(17)

where \({{s}_{cH}}\) is the residual error caused by the NN approximation.

To estimate \({{W}_{c}}\), the CNN (15) is approximated as

$$\begin{aligned} {\hat{J}}(s)={\hat{W}}_{c}^{{\mathsf {T}}}{{\sigma }_{c}}(s) \end{aligned}$$
(18)

and we can obtain \(\nabla {\hat{J}}(s)\) from (18) that

$$\begin{aligned} \nabla {\hat{J}}(s)={{ \nabla {{\sigma }^{{\mathsf {T}} }_{c}}(s)}}{{{\hat{W}}}_{c}}. \end{aligned}$$
(19)

Inserting (19) into (17), we have the approximate Hamiltonian as

$$\begin{aligned} H( s,{u_s},{{{{\hat{W}}}}_{c}} )=Z(s,{u_s})+ {\hat{W}}_{c}^{{\mathsf {T}}}\nabla {{\sigma }_{c}}(s){\dot{s}}={s_c}. \end{aligned}$$
(20)

Denote \(\varpi =\nabla {{\sigma }_{c}}(s){\dot{s}}\), and assume that there exists a constant \({{\varpi }_{M}}>0\) such that \(\left\| \varpi \right\| \le {{\varpi }_{M}}\). By minimizing the objective function \({{E}_{c}}=\frac{1}{2}s_{c}^{{\mathsf {T}} }{{s}_{c}}\) with gradient descent algorithm, \({{{\hat{W}}}_{c}}\) should be updated by

$$\begin{aligned} {{\dot{{\hat{W}}}}_{c}}=-{{\beta }_{c}}{{s}_{c}}\varpi , \end{aligned}$$
(21)

where \({{\beta }_{c}}>0\) is the learning rate.

Define the weight approximation error as

$$\begin{aligned} {\tilde{W}}_{c}={{W}_{c}}-{{{\hat{W}}}_{c}}. \end{aligned}$$
(22)

From (17), (20) and (22), one has

$$\begin{aligned} {{s}_{c}}={{s}_{cH}}-\tilde{W}_{c}^{{\mathsf {T}}}\varpi . \end{aligned}$$
(23)

Then, the weight approximation error is updated by

$$\begin{aligned} {{\dot{\tilde{W}}}_{c}}=-{{\dot{{\hat{W}}}}_{c}}={{\beta }_{c}}{{s}_{c}}\frac{\partial {{s}_{c}}}{{{{{\hat{W}}}}_{c}}}={{\beta }_{c}}({{s}_{cH}}-\tilde{W}_{c}^{{\mathsf {T}} }\varpi )\varpi . \end{aligned}$$
(24)

Inserting (16) to (13), the ideal SM-based iterative control strategy is expressed by

$$\begin{aligned} {u_s}=-\frac{1}{2}{{R}^{-1}}{{g}^{{\mathsf {T}}}}(x)( \nabla \sigma _{c}^{{\mathsf {T}}}(s){{W}_{c}}+\nabla {{\varepsilon }_{c}^{{\mathsf {T}}}}(s)) \end{aligned}$$
(25)

and it is approximated as

$$\begin{aligned} {{{\hat{u}}}_{s}}=-\frac{1}{2}{{R}^{-1}}{{g}^{{\mathsf {T}}}}(x){{ \nabla {{\sigma }^{{\mathsf {T}} }_{c}}(s)}}{{{\hat{W}}}_{c}}. \end{aligned}$$
(26)

Theorem 1

Considering the nominal MRR system (3), if the weight vector of the SM-based CNN is tuned by (21), the weight approximation error is guaranteed to be UUB.

Proof

Choose a Lyapunov function candidate as

$$\begin{aligned} {{L}_{1}}=\frac{1}{2{{\beta }_{c}}}\tilde{W}_{c}^{{\mathsf {T}}}{{\tilde{W}}_{c}}. \end{aligned}$$
(27)

Taking the time derivative of (27), we have

$$\begin{aligned} {{{\dot{L}}}}_{1}&=\frac{1}{{{\beta }_{c}}}\tilde{W}_{c}^{{\mathsf {T}} }{{{\dot{\tilde{W}}}}_{c}} \nonumber \\&=\tilde{W}_{c}^{{\mathsf {T}} }({{s}_{cH}}-\tilde{W}_{c}^{{\mathsf {T}} }\varpi )\varpi \nonumber \\&=\tilde{W}_{c}^{{\mathsf {T}} }{{s}_{cH}}\varpi -{{\big \Vert \tilde{W}_{c}^{{\mathsf {T}} }\varpi \big \Vert }^{2}} \nonumber \\&\le \frac{1}{2}s_{cH}^{2}-\frac{1}{2}{{\big \Vert \tilde{W}_{c}^{{\mathsf {T}} }\varpi \big \Vert }^{2}}. \end{aligned}$$
(28)

Therefore, we can obtain \({{{\dot{L}}}_{1}}\le 0\) as long as \({{\tilde{W}}_{c}}\) lies outside the compact set \({{\varOmega }_{c}}= \{{{\tilde{W}}_{c}}: \Vert {{{\tilde{W}}}_{c}} \Vert <\frac{{{s}_{cH}}}{{{\varpi }_{M}}}\}\). Thus, the CNN weight estimation error \({{\tilde{W}}_{c}}\) is UUB. This completes the proof.

Sliding mode-based approximate optimal control

In light of previous analysis, we can design the SMAOC law as

$$\begin{aligned} {{u}_{0}}={{u}_{s}}-\frac{{\hat{\varphi }}+{{k}_{1}}s+{{k}_{2}}{\mathrm{sgn}} (s)}{g(x)}, \end{aligned}$$
(29)

where \({\mathrm{sgn}} (s)= [{\mathrm{sgn}}({{s}_{1}}),{\mathrm{sgn}} ({{s}_{2}}),\ldots ,{\mathrm{sgn}} ({{s}_{n}})] \in {{\mathbb {R}}^n}\), \({{k}_{1}}\) and \({{k}_{2}}\) are positive definite constant matrices, and \({\hat{\varphi }}\) is a robust term which is tuned by

$$\begin{aligned} \dot{{\hat{\varphi }}}={{\beta }_{\varphi }}s, \end{aligned}$$
(30)

where \({{\beta }_{\varphi }}>0\) is a positive definite matrix. According to the SMC theory and the proposed controller (29), the reachable condition \({{s}^{{\mathsf {T}}}} {\dot{s}}\le 0\) ensures the MRR system states reach and stay on the SM surface.

Theorem 2

Consider the nominal MRR system (3), the SM surface (6) and its time derivative (7). The tracking error of the MRR system can arrive and stay on the SM surface thereafter under the developed SMAOC law (29).

Proof

Choose a Lyapunov function candidate as

$$\begin{aligned} {{L}_{2}}=\frac{1}{2}{{s}^{{\mathsf {T}}}}s+\frac{1}{2{{\beta }_{\varphi }}}{{{\tilde{\varphi }}}^{{\mathsf {T}}}}{\tilde{\varphi }}, \end{aligned}$$
(31)

where \({\tilde{\varphi }}=\varphi -{\hat{\varphi }}\). Introducing the SMAOC law (29) into (31), we have

$$\begin{aligned} {{{{\dot{L}}}}_{2}}&={{s}^{{\mathsf {T}}}}{\dot{s}}-\frac{1}{{{\beta }_{\varphi }}}\dot{{\hat{\varphi }}}{\tilde{\varphi }} \nonumber \\&={{s}^{{\mathsf {T}}}}( f(x)+g(x){{u}_{s}}+\varphi -{\hat{\varphi }}-{{k}_{1}}s-{{k}_{2}}{\mathrm{sgn}} (s))\nonumber \\&\quad -\frac{1}{{{\beta }_{\varphi }}}\dot{{\hat{\varphi }}}{\tilde{\varphi }}. \end{aligned}$$
(32)

From Assumption 1, there exist two unknown positive constants \({{D}_{f}}\) and \({{D}_{g}}\), s.t. \(\left\| f(x) \right\| \le {{D}_{f}}\) and \(\left\| g(x) \right\| \le {{D}_{g}}\). Using Young’s inequality, (32) becomes

$$\begin{aligned} {{{{\dot{L}}}}_{2}}&={{D}_{f}}\left\| s \right\| +{{s}^{{\mathsf {T}}}}g(x){{u}_{s}}-{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}\nonumber \\&\quad -{{\lambda }_{\min }}({{k}_{2}})\left\| s \right\| +{\tilde{\varphi }}s-\frac{1}{{{\beta }_{\varphi }}}\dot{{\hat{\varphi }}}{\tilde{\varphi }} \nonumber \\&\le -{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-{{\lambda }_{\min }}({{k}_{2}})\left\| s \right\| \nonumber \\&\quad +{{D}_{f}}\left\| s \right\| +{{D}_{g}}\left\| s \right\| \left\| {{u}_{s}} \right\| \nonumber \\&\le -\delta \left\| s \right\| , \end{aligned}$$
(33)

where \(\delta ={{\lambda }_{\min }}({{k}_{2}})-{{D}_{f}} -{{D}_{g}}\left\| {{u}_{s}} \right\| \). It implies that the system tracking errors reach the SMC and remain on it with \(\delta \ge 0\). From \({{{\dot{L}}}_{2}}\le 0\), we can see that s and \({\dot{s}}\) are all bounded. \({{{\dot{L}}}_{2}}\le -\delta \left\| s \right\| \) means that its integral exists as long as \(\int _{0}^{t}{\left\| s \right\| }\le (1/\eta )[{{L}_{2}}(0)-{{L}_{2}}(t)]\). Owing to \({{L}_{2}}(0)\) is bounded and \({{{\dot{L}}}_{2}}\) is monotonically decreasing and has a lower bound, \({{\lim }_{t\rightarrow \infty }}\int _{0}^{t}{\left\| s \right\| }d\tau \) is also bounded. Then, s(t) is asymptotically stable via the Barbalat Lemma, we have \({{\lim }_{t\rightarrow \infty }}s(t)=0\). Furthermore, e(t) converges to zero asymptotically. Therefore, the system states can arrive the SM surface in a finite time. This completes the proof.

Sliding mode-based online fault compensator

Based on the analysis of the fault-free case of MRRs, an online fault compensator is developed to ensure the closed-loop system stable when actuator failures occur, i.e., \({f_a}\ne 0\). By introducing the SMAOC law (let \(u={{u}_{0}}\)), the faulty MRR system (2) becomes

$$\begin{aligned} \left\{ \begin{array}{{l}} {{{{\dot{x}}}}_{1}}={{x}_{2}} \\ {{{{\dot{x}}}}_{2}}=f(x)+g(x)({{u}_{0}}-{{f}_{a}}) \\ y={{x}_{1.}} \end{array} \right. \end{aligned}$$
(34)

According (8), \({{J}^{*}}(s)\ge 0\) with \({{J}^{*}}(0) = 0\). Thus, \({{J}^{*}}(s)\) is a positive definite function. Then, we can obtain \({{{\dot{J}}}^{*}}(s)\) that

$$\begin{aligned} {{{\dot{J}}}^{*}}(s)={{ \nabla {{J}^{*{\mathsf {T}}}}(s) }}{\dot{s}}. \end{aligned}$$
(35)

Combining (7) with (34), one can obtain

$$\begin{aligned} {{{{\dot{J}}}}^{*}}(s)&={{ \nabla {{J}^{*{\mathsf {T}}}}(s) }}\big ( f(x)+g(x)({{u}_{0}}-{{f}_{a}})+\varphi \big ) \nonumber \\&={{ \nabla {{J}^{*{\mathsf {T}}}}(s) }}\big ( f(x)+g(x){{u}_{0}}+\varphi \big )\nonumber \\&\quad -{{ \nabla {{J}^{*{\mathsf {T}}}}(s) }}g(x){{f}_{a}}. \end{aligned}$$
(36)

Considering (9), (14) and assuming there exists a positive constant \({{\varphi }_{M}}\), s.t. \(\left\| {{\tilde{\varphi }}} \right\| \le {{\varphi }_{M}}\), we have

$$\begin{aligned} {{{{\dot{J}}}}^{*}}(s)&= -{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-{{\lambda }_{\min }}({{k}_{2}})\left\| s \right\| \nonumber \\&\quad +{\tilde{\varphi }}s-Z(s,{{u}_{s}})+2u_{s}^{{\mathsf {T}}}(s)R{{f}_{a}} \nonumber \\&\le -{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-{{\lambda }_{\min }}({{k}_{2}})\left\| s \right\| \nonumber \\&\quad +{{\varphi }_{M}}\left\| s \right\| -Z(s,{{u}_{s}}) + 2u_{s}^{{\mathsf {T}}}(s)R{{f}_{a}} \nonumber \\&\le -({{\lambda }_{\min }}({{k}_{2}})-{{\varphi }_{M}})\left\| s \right\| -{{s}^{{\mathsf {T}}}}Qs-u_{s}^{{\mathsf {T}} }R{{u}_{s}}\nonumber \\&\quad +u_{s}^{{\mathsf {T}}}R{{u}_{s}} + f_{a}^{{\mathsf {T}}}R{{f}_{a}} \nonumber \\&\le -\zeta \left\| s \right\| -{{s}^{{\mathsf {T}}}}Qs+f_{a}^{{\mathsf {T}}}R{{f}_{a}}, \end{aligned}$$
(37)

where \(\zeta ={{\lambda }_{\min }}({{k}_{2}})-{{\varphi }_{M}}\), we can conclude that whether \({{{\dot{J}}}^{*}}(s)\) is negative or not depends on \({{f}_{a}}\). Therefore, it is expected to design an online fault compensator to guarantee the stability of the closed-loop MRR system with actuator failures.

Thus, the SMOFCC law for MRR system (2) is designed as

$$\begin{aligned} u={{u}_{0}}+{{{\hat{f}}}_{a}}, \end{aligned}$$
(38)

where \({{{\hat{f}}}_{a}}\) is estimation referring to the unknown actuator failure, which is adaptively updated by

$$\begin{aligned} {{\dot{{\hat{f}}}}_{a}}={{\beta }_{f}}( 2u_{s}^{{\mathsf {T}}}R-{{s}^{{\mathsf {T}}}}g(x)). \end{aligned}$$
(39)

According to aforementioned design procedure, the block diagram of designed SMOFCC is shown in Fig. 1.

Remark 2

We notice that the SMOFCC scheme (38) is developed based on the online fault compensation control technique [35], rather than the state observer technique [32]. It is worth noticing that the fault compensator (39) can not only to estimate the failure, but also compensate the NN approximation error. However, the state-based fault observer can estimate the failure only.

Fig. 1
figure 1

Block diagram of the proposed SMOFCC

Stability analysis

Theorem 3

Consider the faulty MRR system (2), the CNN (15) with the updating law (21), the cost function (8) and the online compensator \({{{\hat{f}}}_{a}}\) (39). The closed-loop of faulty MRR system can be ensured to be asymptotically stable under the developed SMOFCC policy (38).

Proof

Choose a Lyapunov function candidate as

$$\begin{aligned} {{{\dot{L}}}_{3}}={{L}_{2}}+{{J}^{*}}(s)+\frac{1}{2{{\beta }_{f}}}\tilde{f}_{a}^{{\mathsf {T}}}{{\tilde{f}}_{a}}. \end{aligned}$$
(40)

Taking the time derivative of the \({{L}_{3}}\) along with the solution of (7), and considering (14), we have

$$\begin{aligned} {{{{\dot{L}}}}_{3}}&= {{{{\dot{L}}}}_{2}}+{{ \nabla {{J}^{*{\mathsf {T}}}}(s) }}{\dot{s}}-\frac{1}{{{\beta }_{f}}}{{{\dot{{\hat{f}}}}}_{a}}{{{\tilde{f}}}_{a}}\nonumber \\&=-{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-({{\lambda }_{\min }}({{k}_{2}})-{{D}_{f}})\left\| s \right\| \nonumber \\&\quad +{{D}_{g}}\left\| s \right\| \left\| {{u}_{s}} \right\| +{{s}^{{\mathsf {T}}}}g(x)({{{{\hat{f}}}}_{a}}-{{f}_{a}}) \nonumber \\&\quad +{{ \nabla {{J}^{*{\mathsf {T}}}}(s) }} ( f(x)+g(x)({{u}_{0}}+{{{{\hat{f}}}}_{a}}-{{f}_{a}})+\varphi )\nonumber \\&\quad -\frac{1}{{{\beta }_{f}}}{{{\dot{{\hat{f}}}}}_{a}}{{{\tilde{f}}}_{a}} \nonumber \\&\le -{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-({{\lambda }_{\min }}({{k}_{2}})-{{D}_{f}})\left\| s \right\| \nonumber \\&\quad +{{D}_{g}}\left\| s \right\| \left\| {{u}_{s}} \right\| \nonumber \\&\quad +\left( {{s}^{{\mathsf {T}}}}+{{ \nabla {{J}^{*{\mathsf {T}}}}(s) }} \right) g(x)({{{{\hat{f}}}}_{a}}-{{f}_{a}}) \nonumber \\&\quad -{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-{{\lambda }_{\min }}({{k}_{2}})\left\| s \right\| \nonumber \\&\quad +{{\varphi }_{M}}\left\| s \right\| -Z(s,{{u}_{s}})-\frac{1}{{{\beta }_{f}}}{{{\dot{{\hat{f}}}}}_{a}}{{{\tilde{f}}}_{a}} \nonumber \\&\le -2{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-(2{{\lambda }_{\min }}({{k}_{2}})-{{D}_{f}})\left\| s \right\| \nonumber \\&\quad +\frac{1}{2}D_{g}^{2}{{\left\| {{u}_{s}} \right\| }^{2}}+\frac{1}{2}{{\left\| s \right\| }^{2}}\nonumber \\&\quad -{{s}^{{\mathsf {T}}}}Qs-u_{s}^{{\mathsf {T}}}R{{u}_{s}}+\Big ( 2u_{s}^{{\mathsf {T}}}R-{{s}^{{\mathsf {T}}}}g(x)-\frac{1}{{{\beta }_{f}}}{{{\dot{{\hat{f}}}}}_{a}} \Big ){{{\tilde{f}}}_{a}} \nonumber \\&\le -2{{\lambda }_{\min }}({{k}_{1}}){{\left\| s \right\| }^{2}}-\upsilon \left\| s \right\| \nonumber \\&\quad +\frac{1}{2}D_{g}^{2}{{\left\| {{u}_{s}} \right\| }^{2}}+\frac{1}{2}{{\left\| s \right\| }^{2}}-{{\lambda }_{\min }}(Q){{\left\| s \right\| }^{2}}\nonumber \\&\quad -{{\lambda }_{\min }}(R){{\left\| {{u}_{s}} \right\| }^{2}} + \left( 2u_{s}^{{\mathsf {T}}}R-{{s}^{{\mathsf {T}}}}g(x)-\frac{1}{{{\beta }_{f}}}{{{\dot{{\hat{f}}}}}_{a}}\right) {{{\tilde{f}}}_{a}}, \end{aligned}$$
(41)

where \(\upsilon \text {=}2{{\lambda }_{\min }}({{k}_{2}})-{{D}_{f}}\). Substituting (39) into (41), one can obtain

$$\begin{aligned} {{{\dot{L}}}_{3}}&\le - \left( {{\lambda }_{\min }}(Q)+2{{\lambda }_{\min }}({{k}_{1}})-\frac{1}{2}\right) {{\left\| s \right\| }^{2}} \nonumber \\&\quad -\upsilon \left\| s \right\| -\left( {{\lambda }_{\min }}(R)-\frac{1}{2}D_{g}^{2}\right) {{\left\| {{u}_{s}} \right\| }^{2}}\nonumber \\&\le -{{\varGamma }_{1}}{{\left\| s \right\| }^{2}}-{{\varGamma }_{2}}{{\left\| {{u}_{s}} \right\| }^{2}}-\upsilon \left\| s \right\| , \end{aligned}$$
(42)

where \({{\varGamma }_{1}}={{\lambda }_{\min }}(Q)+2{{\lambda }_{\min }}({{k}_{1}})-\frac{1}{2}\) and \({{\varGamma }_{2}} = {{\lambda }_{\min }}(R)-\frac{1}{2}D_{g}^{2}\). Thus, we can obtain that \({{{\dot{L}}}_{3}}\le 0\) when \({{\lambda }_{\min }}(Q)+2{{\lambda }_{\min }}({{k}_{1}})\ge \frac{1}{2}\), \({{\lambda }_{\min }}(R)\ge \frac{1}{2}D_{g}^{2}\), and \(\upsilon \ge 0\). Hence, asymptotic stability of the MRR tracking error is ensured with the developed SMOFCC policy. This completes the proof.

Remark 3

The difference between model-free control and model-based control lies in whether the dynamic model of the controlled plant is known or not. In this paper, the SMOFCC scheme is designed based on known system dynamics, and it surely can be extended to model-free case as long as the system dynamics is available. To achieve this goal, one strategy is to employ the observer [15] or identifier [48] to estimate the system dynamics, and then directly applied it to propose the control method. While, another way is to develop a pure model-free control method, i.e., the controller is designed directly with system input-output data [28].

Simulation study and results analysis

In this section, a 2-DOF MRR (see configuration b in [3]) is employed to verify the effectiveness of the theoretical results of SMOFCC comparatively.

The desired trajectories of two joint modules are defined as

$$\begin{aligned} {{q}_{d}}=\left[ \begin{matrix} {{q}_{1d}} \\ {{q}_{2d}} \\ \end{matrix} \right] =\left[ \begin{matrix} 0.4\sin (0.6t)+0.3\cos (0.5t) \\ 0.2\cos (0.9t)-0.5\sin (0.4t) \\ \end{matrix} \right] \end{aligned}$$
(43)

and an unknown additive actuator failure is assumed to be

$$\begin{aligned} {{f}_{a}}(t)=\left\{ \begin{array}{{l}} \begin{array}{*{35}{l}} [0;0], &{} 0\le t\le 30s \\ \end{array} \\ \begin{matrix} [0.2+0.1\sin (0.4t);0], &{} 30s<t\le 60s. \\ \end{matrix} \\ \end{array} \right. \end{aligned}$$
(44)

To approximate (8), we employ a CNN (18) with \(M=3\), the weight vector as \({{{\hat{W}}}_{c}}={{[{{{\hat{W}}}_{c1}},{{{\hat{W}}}_{c2}},{{{\hat{W}}}_{c3}}]}^{{\mathsf {T}}}}\), and the activation function as \({{\sigma }_{c}}={{[s_{1}^{2},{{s}_{1}}{{s}_{2}},s_{2}^{2}]}^{{\mathsf {T}}}}\), respectively. Related initial weight vectors and control parameters are listed in Table 1.

Table 1 Control parameters

To show the superiority of the proposed scheme, we compare the control performance of the developed SMOFCC scheme with the existing optimal control scheme that based on tracking error feedback only in [3]. Figures 2, 3, 4 and 5 illustrate the simulation results under the proposed control strategy, and Figs. 6, 7, 8 and 9 depict the simulation results under the control method in [3].

Fig. 2
figure 2

Online fault estimation under the developed SMOFCC

Fig. 3
figure 3

Trajectories tracking performance under the developed SMOFCC

Fig. 4
figure 4

Tracking errors under the developed SMOFCC

Fig. 5
figure 5

Control inputs under the developed SMOFCC

Fig. 6
figure 6

Online fault estimation under the optimal control in [3]

Fig. 7
figure 7

Trajectories tracking performance under the optimal control in [3]

Fig. 8
figure 8

Tracking errors under the optimal control in [3]

Fig. 9
figure 9

Control inputs under the optimal control in [3]

The designed compensator (39) is employed to estimate the fault amplitude online. From Figs. 2 and 6, we can observe that the estimated failure tracks the actual failure within less than 1 s after the system runs with the developed SMOFCC scheme, while it takes near 10 s with the control scheme in [3]. Moreover, the SMOFCC can obtain a smaller overshoot. That is to say, the estimated fault amplitude with the proposed compensator (39) has a smaller bias and a higher accuracy. Figure 3 shows that the actual trajectories track the desired ones within 3 s under the proposed scheme, and a faster convergence rate of MRR is obtained compared to that in [3]. Meanwhile, compared to Fig. 7, Fig. 3 shows that SMOFCC provides a faster convergence at the beginning of the system runs in fault free scenario. We can see from Fig. 4 that the tracking errors gradually decrease and reach steady state after 4 s, which indicates the above results more intuitively. Figures 5 and 9 illustrate the control inputs of two different control methods, respectively. We can see that the control input of joint 1 under the SMOFCC has a slight change after the actuator failure occurs at \(t=30\) s due to the online fault compensation. From above simulation results, we can see that the SMOFCC has a faster convergence rate, a smaller overshoot and a higher accuracy. That is because the SMOFCC scheme introduces the SMC technique which combines the position tracking error e and the velocity tracking error \({\dot{e}}\) similar to the proportion and derivation controller [49]. The proportion regulation improves the system convergence rate and reduces the system error, and derivation reduces system overshoot and tuning time. Besides, the FDI unit is removed since introducing the online fault estimation. Therefore, the fault diagnosis time is greatly reduced and a good fault tolerance performance can be obtained in spite of the actuator failure occurs. Furthermore, the closed-loop MRR system is asymptotically stable, rather than UUB. The system states can be recovered by the online fault compensation after the actuator failure occurs. Thus, the tracking performance under the designed SMOFCC is superior to that in [3]. In summary, the proposed control scheme performs a better tracking and fault tolerant performance for MRRs by introducing SM surface.

Conclusion

In this paper, we propose SMOFCC scheme which extends the ADP-based control with SMC technique to solve the FTC problem of MRRs with unknown actuator failures. The SMOFCC consists of the SM-based iterative controller, an adaptive robust term and an online fault compensator. Thus, the prior nominal controller that relies on the knowledge of accurate dynamic model can be relaxed based on the SMC technique. Hereafter, the stability of the closed-loop MRR system is ensured to be asymptotically stable, rather than UUB. Based on the online estimated actuator failures, the proposed SMOFCC scheme removes the FDI unit. Comparable simulation results show that the developed scheme can provide a faster convergence and a less overshoot than existing optimal control methods which were developed based on tracking error feedback only. In the future work, the approximated optimal FTC problems for MRRs with other fault scenarios and noises will be further considered.