1 Introduction

Statistical process control (SPC) can be used in wide range of industries (Miletic et al. 2004). "Profiles monitoring" is a kind of SPC's application that focuses on monitoring the relationship between dependent and explanatory variable (s). Simply, it means, what we have from the process is data that process engineer collected by measuring of qualitative characteristics, but what we need to control is not the data, it is a (or a set) of characteristic that calculated trough a function (profile) that it’s explanatory variables are raw data collected by process engineer.

Some researcher have addressed the practical applications of profile. For instance, in semiconductor industry, pressure (response variable Y) is a quality characteristic that determined by a linear relationship of amount of flow (explanatory variable, X) (by Kang and Albin (2000)). Another application of profiles that nonlinear profile has been used is controlling density of chipboard boards in divers depth of board surface (Walker and Wright (2002)). Williams et al. (2007) provide another example which deals with the estimated dose–response curve of a manufactured drug. There is another study that address applications of profile in real world, to get more information the reader can refer to Mestek et al. (1994), Stover and Brill (1998), Amiri et al. (2010), Jin and Shi (2001).

Simple linear, multivariate simple linear, binary, simple multiple linear (Generalized), multivariate multiple linear, polynomial and nonlinear profiles, are the different type of profiles, according to the type of applied model to demonstrate the variables relationship.

Based on the phase of profile monitoring-phase I or Phase II- the proposed methods are different, controlling the process stability and quickly finding the shifts are the aim of Phase I and II, respectively. Hence, a lot of studies have conducted to monitoring profiles both in Phase I and II. For example, in Phase I profile monitoring, Amiri et al. (2015) and Shadman et al. (2015) monitored generalized profiles, and Shadman et al. (2017), Maleki et al. (2017) and Keramatpour et al. (2014) proposed different approaches for monitoring generalized profiles, auto-correlated binary profiles, and AR (1) auto-correlated polynomial, respectively, in Phase II. Moreover, Woodall (2007) and Noorossana et al. (2011) provide a comprehensive study of profile monitoring.

In many cases of profile monitoring, the proposed methods detect the out of control condition by a considerable time after the process of changing in real-time, in other words, there is a large gap between control chart alarming and the time when a change takes place, the time that is known as “change point”. Different types of change may occur and lead the process out of control, including single or multiple step change, isotonic change, linear trend, monotonic, and sporadic (See Ayoubi and Kazemzadeh 2015; Thomas R Samuel and Pignatjello Jr 1998). Many researchers have addressed change point estimation problem (e.g., see Samuel et al. 1998; Perry and Pignatiello 2006; Marcus B Perry and Pignatiello Jr 2006; Pignatiello and Samuel 2001a, b; Pignatiello Jr and Samuel 2001; Nedumaran et al. 2000; Noorossana et al. 2009).

Change point detection in the realm of the profile has been studied by numerous authors. Zou et al. (2006), Mahmoud et al. (2007) applied likelihood ratio method to estimate step change in phase II and phase I, respectively, in simple linear profiles. Kazemzadeh et al. (2008) studied step change problem in phase I in polynomial profiles by adopting the likelihood ratio approach as well. A method based on the Likelihood Ratio Test (LRT) approach was proposed to estimate the step changes in multivariate multiple linear regression profiles in Phase II by Eyvazian et al. (2011). Zou et al. (2007) studied step shift in general linear profiles in phase II. Alireza Sharafi et al. (2013a) developed an MLE method for step change point estimation in Phase II in Poisson regression profiles. Step and linear trend change points detection of the binary profile in phase II was studied by Sharafi et al. (2012) and Alireza Sharafi et al. (2013b), respectively. Zand et al. (2013) addressed two methods to detect the time of step change in phase I monitoring of binary profiles. Estimate step change point in phase II monitoring of Generalized Linear Models (GLM)-such as Gamma- and logistic regression profiles were addressed by Sogandi and Amiri (2014b) and Alireza Sharafi et al. (2014), respectively. Kazemzadeha et al. (2016) studied step shift in the simple linear AR (1) autocorrelation profiles. Also, in multivariate linear profiles. Reza Baradaran Kazemzadeh et al. (2015) developed an MLE method for both step and linear drift changes. Sogandi and Amiri (2014a) derived a method based on MLE, as well, for linear change detection in gamma profiles. Isotonic change point problem was studied by Vakilian et al. (2015) in simple linear profiles with first-order autoregressive autocorrelation structure within each profile. Monotonic change point detection was investigated by Ayoubi et al. (2014) and Sogandi and Amiri (2017) in phase II in generalized linear model-based regression and multivariate linear profiles, respectively. occasional change point was detected by Ayoubi et al. (2016) in phase II in the mean of polynomial profiles.

A wide range of different type of regression models that have not been included in typical relationships (which some of them discussed earlier) compromise complicated, vast and various regression models which are known as “nonlinear models”. This kind of relationship between dependent and explanatory variable(s) in the SPC realm called “nonlinear profiles”. Unlike the broad and inevitable applications of nonlinear profiles, a few researchers investigated them. Because estimating the nonlinear profiles' parameters is difficult, and the large variety of these models adds this difficulty.

However, some researchers have provided real cases of nonlinear profiles and suggested several methods for monitoring these kind of profiles (e.g., seeWalker and Wright 2002; Williams et al. 2007; Brill 2001; Jin and Shi 1998; Jeong et al. 2006). In addition, some researchers studied nonlinear profiles monitoring (e.g., seeMcQuarrie 1999; Fan et al. 2011; Jensen and Birch 2009; McGinnity et al. 2015; Keramatpour et al. 2014).

Ghazizadeh et al. (2018) investigated the problem of point change in nonlinear profiles. They have developed an MLE method for detecting single step change point, without any knowledge about the type of nonlinear function, in phase II. According to extreme and inevitable nonlinear profiles usages, it seems that this area needs more attention. Hence, in this work we extended the study which has done by Ghazizadeh et al. (2018) and proposed a new method, based on Artificial Neural Network (ANN), to estimate step change point in nonlinear profiles in phase II monitoring.

The rest of this paper is organized as follows. In Sects. 2, the proposed ANN method is discussed. The proposed method is evaluated, and its performance is shown in Sect. 3. Some ideas for future works and conclusion are provided in Sect. 4.

2 The proposed artificial neural network method

The step involved with selecting a well-designed network is a significant step in the sense that it can affect the performance of the proposed procedure. There are several types of neural networks amongst which feed-forward Multi-Layer Perceptron (MLP) is the most common (Yazdanparast et al. 2018; Gharoun et al. 2018; Amalnick et al. 2019). This type has been successfully applied toward solving difficult problems with backpropagation algorithm. Backpropagation algorithm is created using multiple layer network and nonlinear transfer function. In spite of its need to long training times, it has gained widespread acceptance as the most popular algorithm for the supervised training of MLP.

A number of scholars have successfully applied Back Propagation Network (BPN) to various control chart pattern recognition in multivariate processes (Guh 2007; Niaki and Abbasi 2008; Atashgar and Noorossana 2011). In this paper, however, BPN is used to detect process shifts when the out of control condition has been signalled by an appropriate control chart. For this reason multiple \({T}^{2}\) control chart proposed by Vaghefi et al. (2009) is adopted for signalling the out of control condition. Many authors such as Zorriassatine and Tannock (1998), Hwarng (2008) and Guh (2007) have discussed the effectiveness of BPN in detecting process shifts.

The input layer in our ANN contains two neurons that are determined based on the feature vector. The question of deciding on the right number of hidden layers as well as their neurons still remains at the expertise of practitioners to be answered. In fact, numerous applications indicate that a maximum of three hidden layers is believed to be sufficient for any network and the right number of hidden layers and their neurons needs to be finalized through trial and error (Zorriassatine and Tannock 1998).

The output layer contains one neuron. The output related to the selected profile is zero or one, where zero is indicative of no shift, and one indicates a shift in one or more regression parameters of the selected profile and shows that the process is out of control.

3 The threshold determining

ANN generates values which are continuously defined over the interval [0, 1]. As such, it is necessary to identify a threshold or cut-off value on whose basis one can transform the continuous output to discrete fixed values 0 and 1 for classifying the output values as “in control” and “out of control”. We have defined the threshold based on in-control average run length (ARL),\(\frac{1}{\alpha }\), by taking the following steps:

  1. 1)

    Determine the probability of type I error.

  2. 2)

    Generate as many as r in-control profiles for the ANN. To achieve this purpose, we resort to multiple \({T}^{2}\) control chart as developed by Vaghefi et al. (2009).

  3. 3)

    Sort the output values in ascending order.

  4. 4)

    Set the threshold value at the \(100\left(1-\alpha \right)th\) percentile. In this paper we set r = 10,000 and \(\alpha =0.05\). This amounts to a threshold value of 0.23. Any output smaller than 0.23 is interpreted as zero which means that the profile is under control while an output larger than 0.23 is taken as 1 which means that the corresponding profile is out of control.

If all profiles are forwarded to the network, there would be two successive series of the numbers zero and one. The first series contains zero and is related to the in-control profiles and when the number changes to one it means that a shift has just occurred in the process. This time is the change point which is shown by \(\tau \). The second series starts by a 1 and consists of the \(\tau +1\) to the last profile.

4 The feature vector

It is essential to select an appropriate feature vector to illustrate the difference between patterns. Hence, we employ \({Z}_{i}\) and \({T}_{i}^{2}\) statistics related to multivariate EWMA and \({T}^{2}\) control charts and make some changes on them to provide an appropriate feature vector. EWMA and \({T}^{2}\) control charts have used by some authors to monitoring profiles, for example, (Vaghefi et al. 2009) proposed these charts for monitoring nonlinear profiles in Phase II. According to Vaghefi et al. (2009) \({Z}_{i}\) and \({T}_{i}^{2}\) statistics are as follows:

$$ z_{i} = \lambda e_{i} + \left( {1 - \lambda } \right)z_{i - 1} $$
(1)
$$ T_{i}^{2} = (\hat{\theta }_{i} - \theta_{0} )\prime \sum_{\theta } (\hat{\theta }_{i} - \theta_{0} ) $$
(2)

where \(\left( {e_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{n} e_{ij} }}{n}} \right)\) is the \(i\) th sample mean of the residuals, \(0<\lambda <1\) is the smoothing parameter, and \({Z}_{0}\) is assumed to be equal to zero.

ANNs are sensitive to severe noise. So, the feature vector should have a noise as small as possible for any pattern. By smoothing \({Z}_{i}\) and \({T}_{i}^{2}\) statistics, \(\overline{Z}_{i} = \frac{1}{k}\mathop \sum \limits_{j = i}^{i + k - 1} Z_{j}\) and \(\overline{T}_{i}^{2} = \frac{1}{k}\mathop \sum \limits_{j = i}^{i + k - 1} T_{j}^{2}\) will be regarded as the smoothed values of \({Z}_{i}\) and \({T}_{i}^{2}\), respectively. The value of k is chosen based on the number of samples and the severity of the noise. In this paper we assume \(k=5\).

After constructing, training, and testing the BPN, it turned out that the vector consisting of \({\overline{Z} }_{i}\) and \({\overline{T} }_{i}^{2}\) is not appropriate as the feature vector. Thus, for a better illustration of the difference between in control and out of control profiles, we decided to consider the following equations.

$$ BZ\left( i \right) = \frac{1}{m - i}\mathop \sum \limits_{t = i + 1}^{m} \overline{Z}_{t} $$
(3)
$$ BT^{2} \left( i \right) = \frac{1}{m - i}\mathop \sum \limits_{t = i + 1}^{m} \overline{T}_{t}^{2} $$
(4)

Here, m is the last out of control profile that is signalled by control chart and \(BZ\left(i\right)\) and \(B{T}^{2}\left(i\right)\) stand for the averages of \({\overline{Z} }_{i}\) and \({\overline{T} }_{i}^{2}\) respectively from sample i to sample m.

The differences between \(BZ\left(i\right)\) and \({\overline{Z} }_{i}\) for in control profiles are not the same as for out of control profiles; this assertion is also true for \(B{T}^{2}\left(i\right)\) and \({\overline{T} }_{i}^{2}\). In an increasing nonlinear regression model, such difference in out of control profiles would be less than what it will be for in control profiles while for a decreasing model it would be the opposite. Therefore, the feature vector is a \(2\times 1\) vector as \(\left( {DBZ\left( i \right),DBT^{2} \left( i \right)} \right)\) whose elements are as follows:

$$ DBZ\left( i \right) = \left| {BZ\left( i \right) - \overline{Z}_{i} } \right| $$
(5)
$$ DBT^{2} \left( i \right) = \left| {BT^{2} \left( i \right) - \overline{T}_{i}^{2} } \right| $$
(6)

5 Network training

In this research, the Monte Carlo simulation is used to perform the required training and data testing for ANN. Network training amounts to providing the training data set to the network repetitively until the stopping conditions are satisfied. The stopping rules for our ANN are the sum of squared errors \(({10}^{-5})\) and the number of epochs (300). The faster-stopping rule between these two is used to avoid overtraining and the network generalization feature.

The explanatory and dependent variables relationship in nonlinear profile shows as:

$$ y_{ij} = f\left( {x_{ij} ,{\varvec{\theta}}_{{\varvec{0}}} } \right) + \varepsilon_{ij} $$
(7)

In Eq. (7), f stands for nonlinear regression function, with a vector of parameters \({{\varvec{\theta}}}_{\mathbf{i}}\), which is \(p\times 1\) vector of profile i's parameters. Here,\({\varepsilon }_{ij}\) is an abbreviation for random error that independently distributes normal random variables with a mean of zero and unknown variance \({\sigma }^{2}\).

Since nonlinear profiles encompass many nonlinear models, any method proposed for estimating change points should have the ability to cover all such models. The proposed ANN in this paper meets this requirement by varying degrees of accuracy.

Having in mind that testing all types of nonlinear profiles is not that practical, we confine our study to the kind of profiles as in Eq. (8).

$$ y_{ij} = \theta_{01} e^{{\theta_{01} + \theta_{02} x_{j} }} + \varepsilon_{ij} $$
(8)

Estimating parameters in nonlinear regression is more difficult than other the rest of the models. Hence, the parametric approach can be used for nonlinear profiles when the least square error estimates converge in a limited number of iterations, that is, when the model’s parameters that are so accurate. An iterative method that is usually used in this situation is Gauss–Newton. The model in Eq. (8) which belongs to the large and well known exponential family of nonlinear models enjoys such a feature. Due to its desirable characteristics, this model has been investigated by Vaghefi et al. (2009) in evaluating the performance of their proposed control charts for monitoring nonlinear profiles.

6 Training data set

The training dataset consists of in control and out of control profiles and is generated based on the adopted model in Eq. (8). The vector of parameters is designated by \({\varvec{\theta}}\), the vector of in control and out of control process parameters are denoted by \({{\varvec{\theta}}}_{0}\) and \({{\varvec{\theta}}}_{1}\), respectively. The training data are generated through the following steps:

1) Generate as many as q out of control profiles where q represents the number of out of control profiles corresponding to each out of control vector that has determined \({{\varvec{\theta}}}_{1}\). Perform the Eq. (9) to arrive at the out of control vectors of parameters. The value of q is selected arbitrary.

2) Generate in control profiles as many as half of total out of control profiles that is equal to 2pq where p stands for the number of parameters in the model. According to Eq. (9) we can generate 4p different vector in value for \({{\varvec{\theta}}}_{1}\), thus, the number of total out of control data is 4pq that we generate as many as half of this number in control profiles. This fraction determined based on our best experience around the various experiment.

$$ {\varvec{\theta}}_{{\varvec{1}}} = {\varvec{\theta}}_{{\varvec{0}}} + \left[ {\begin{array}{*{20}c} {k\overline{{\sigma_{{\hat{\theta }_{01} }} }} r_{1} \left( {1 - r_{2} } \right) \ldots \left( {1 - r_{p} } \right)} \\ {\begin{array}{*{20}c} {k\overline{{\sigma_{{\hat{\theta }_{01} }} }} \left( {1 - r_{1} } \right)r_{2} \ldots \left( {1 - r_{p} } \right)} \\ \vdots \\ {k\overline{{\sigma_{{\hat{\theta }_{01} }} }} \left( {1 - r_{1} } \right)\left( {1 - r_{2} } \right) \ldots r_{p} } \\ \end{array} } \\ \end{array} } \right]^{^{\prime}} $$
(9)

where \(\overline{{\sigma_{{\hat{\theta }_{01} }} }} = \mathop \sum \limits_{s = 1}^{2pq} \sigma_{{\hat{\theta }_{01} s}} /2pq\) that \({\sigma }_{{\widehat{\theta }}_{01}s}\) \({\upsigma }_{{\widehat{\uptheta }}_{01}}\) \({\upsigma }_{{\widehat{\uptheta }}_{01}}\) stands for the estimate of standard deviation of \({\widehat{\theta }}_{01}\) and as well as for other similar parameters in Eq. (9) like\(\overline{{\sigma }_{{\widehat{\theta }}_{02}}}\). Also,\(k=-2, -1, 1, 2\);\({r}_{1}=0, 1\);\({r}_{2}=0, 1\); …; \({r}_{p}=0, 1\) that\(\sum_{j=1}^{p}{r}_{j}=1\). The variance–covariance matrix of the parameters of each profile will be calculated when the parameters of the kth in control profile are estimated. This matrix has been employed in Eq. (2) to calculate\({\overline{T} }_{i}^{2}\).

3) Calculate the feature vector corresponding to each generated profile. Note that, the intended control chart for the training phase of ANN is the same as the chart for its performing phase. In this paper, multiple \({T}^{2}\) control chart is adopted for training and performing ANN and \({\chi }_{\mathrm{2,0.005}}^{2}=10.5966\) is its control limit that determined in respect to specific in control ARL.

In this paper, the model represented in Eq. (8) is adopted to performance evaluation that training of ANN should be done by respect to it. Thus, as many as 4,000 in control profiles were generated through simulation runs to form the training data set. Since the training data set must include the whole spectrum of variation of the profiles, the out of control profiles have been generated by considering different values for the vector\({{\varvec{\theta}}}_{1}\). Also, as many as 1,000 out of control profiles were simulated for each value of \({{\varvec{\theta}}}_{1}\) according to Eq. (10). This equation is generated in the base of Eq. (9) that the value of \(\overline{{\sigma }_{{\widehat{\theta }}_{01}}}\) and \(\overline{{\sigma }_{{\widehat{\theta }}_{02}}}\) are calculated \(1.968\times {10}^{-4}\) \(1.968 \times 10^{ - 4}\) and\(1.840\times {10}^{-4}\), respectively. The Table 1 shows the selected values of the \({\varvec{\theta}}\) vector as well as the corresponding number of simulated profiles. The sample size is 300 for \({x}_{j}=0.01(0.01) 3.00\). The expected output corresponding to each in control and out of control profile is zero and one, respectively.

$$ \theta_{1} = \theta_{0} + \left[ {\begin{array}{*{20}c} {i \times j \times 1.968 \times 10^{ - 4} } \\ {i \times \left( {1 - j} \right) \times 1.840 \times 10^{ - 4} } \\ \end{array} } \right]^{^{\prime}} \begin{array}{*{20}c} { i = 1,2,3,4} \\ {j = 0,1 } \\ \end{array} $$
(10)
Table 1 the values of \({\varvec{\theta}}\) vector and the number of samples

7 Performance evaluation

In this section, the performance of the proposed ANN method is investigated. There are numerous factors that affect the accuracy of this estimator. The type of nonlinear profile, shape of the curve, sample size, and \(\lambda \) (used in Z statistic in EWMA control chart) are four key factors, discussed below, that influence estimator's accuracy.

The first factor is the type of profile or regression model. This is due to the dependency of the proposed ANN’s performance with the accuracy of the estimated parameters of the model. Meanwhile, parameter estimation highly depends on the shape of the regression curve. Hence, the appropriate model is that whose estimated parameters are so accurate by selected parameters estimator method, that is, Gauss–Newton in this article. The second factor involves the shape of curve factor in a specific type of profile model. This is addressed by accentuate or descendant, and convexity or concavity. It specifies the response variable’s shift magnitude which is created through a change in model parameters. It is evident that the proposed ANN method has a better performance on the shape of curves whose response variable's reaction is strong due to a change in model parameters. The third affecting the estimator accuracy of the proposed change point is sample size, n. Indeed, the primary importance of sample size is owing to its impact on the accuracy of the parameters estimator method. The Z statistic of EWMA control chart proposed by Vaghefi et al. (2009) is applied to generate ANN feature vectors. Here, \(\lambda \), used in Z statistic, is treated as a constant parameter between 0 and 1. This parameter is considered the fourth factor that influences the accuracy of the proposed ANN estimator.

The effects of these four factors on the performance of the proposed change point estimator are now investigated using model in Eq. (8). This model is increasing and concave. To evaluate the effects of the shape of curve factor, the vector of in-control parameters, \({{\varvec{\theta}}}_{0}\), is changed. The desirable sample size is achieved and assessed by dividing the range, \([\mathrm{0.01,3}]\), of explanatory variables of Eq. (8) to 200, 300, and 400. Moreover, the effect of \(\lambda \) factor is studied by assuming values 0.2, 0.4, and 0.6.

Investigation of each factor was done independently. It was considered that \(\lambda =0.2\). A total of 300 samples were produced by setting explanatory variable values to \(0.01(0.01) 3.00\) in all experiments to investigate the shape of the curve factor. To evaluate the effect of sample size, \(\lambda \) and in-control parameters vector, \({{\varvec{\theta}}}_{0}\), were set on 0.2 and \([\mathrm{0.5,2}]\), respectively. To illustrate the role of \(\lambda \), we set \(n=300\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\).

The performance of the proposed ANN with regard to the mentioned factors was studied by Monte Carlo simulation. All experiments were designed by setting the change point at 50 (\(\tau =50\)). All profiles were randomly produced by Eq. (8) by means of the known parameters vector, \({{\varvec{\theta}}}_{0}\) and \({{\varvec{\theta}}}_{1}\), for in-control and out-of-control profiles, respectively. The out-of-control parameters vector, \({{\varvec{\theta}}}_{1}\), was generated by making a step change as much as \({({10}^{-3}\times {\delta }_{1},{10}^{-3}\times {\delta }_{2})}^{^{\prime}}\), where \({\delta }_{1}\) and \({\delta }_{2}\) are fixed values. Therefore, the proposed ANN estimator was employed to find the real time of change between profile 51, the first out-of-control profile, and the sample which \({T}^{2}\) multi-variable control chart had alarmed as out-of-control. The selected control chart has been introduced by (Vaghefi et al. 2009), and its upper control limit has been calculated \({\chi }_{\mathrm{2,0.005}}^{2}=10.5966\) here.

Firstly, we need to achieve a right structure of MLP neural network for this study, different structure of MLP network is designed and implemented. Number of hidden layers, Number of neurons in hidden layer, activation functions in hidden and output layers are the factors that determined each structure. Different structure of MLP is presented in Table 2.

Table 2 The parameters of different structure of neural network used in this study

In this study to assess designed networks, by using a standard case with regard to the Model presented in Eq. (8) and some considerations that mentioned above, hence we train and test the networks mentioned in Table 2 when,\(\tau =50\),\(n=300\), \(\lambda =0.2\) and\({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\), simulations trials\(=\mathrm{1,000}\).

In order to choose the best configuration, mean of the change point estimate (\(E(\widehat{\tau })\)) index is used to evaluate the accuracy of the designed structure. The result of assessing the designed MPL networks presented in Table 2. According to the result the best configuration is MLP6. Thus, this network is selected in this study. This network (MLP6) has one hidden layer with 40 neurons and Tan-sigmoid and Log-sigmoid activation function in hidden and output layer, respectively, and it uses Trainlm (Levenberg–Marquardt backpropagation—LM) method for training. The schematic scheme of selected configuration is shown in Fig. 1.

Fig. 1
figure 1

Schematic scheme of selected fully-connected neural network

To evaluating the performance of selected MLP network, four factors mentioned above is considered. Each factor was explored through different shifts in parameters vector by considering different values for \({\delta }_{1}\) and \({\delta }_{2}\); that is, each change in the parameters vector generated an experiment for any value of each factor. Each experiment was repeated 1,000 times, and the results are reported in Tables 3, 4. They include \(E(T)\), the expected samples until the first out-of-control signal which were generated by \({T}^{2}\) control chart, \(E(\widehat{\tau })\), mean of the change point estimate, \(se(\widehat{\tau })\), and the standard deviation of the change point estimator. Empirical probabilities of different shifts are reported in Table 5.

Table 3 The performance results of proposed estimator for different \({{\varvec{\theta}}}_{0}\), while \({\theta }_{01}\) changes to \({\theta }_{01}+{\delta }_{1}\), \(\tau =50\),\(\lambda =0.2\)
Table 4 The performance results of proposed estimator for different \({{\varvec{\theta}}}_{0}\), while \({\theta }_{02}\) changes to \({\theta }_{02}+{\delta }_{2}\), \(\tau =50\),\(\lambda =0.2\) and \(n=300\)
Table 5 The performance results of proposed estimator for different \(n\), while \({\theta }_{01}\) changes to \({\theta }_{01}+{\delta }_{1}\), \(\tau =50\),\(\lambda =0.2\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}].\)

The results of evaluations of the curve’s shape factor on the proposed estimator's performance are depicted in Tables 3, 6, and Fig. 2, for changes in \({\theta }_{01}\) as size as \({\delta }_{1}\) and \({\theta }_{02}\) as size as \({\delta }_{2}\), respectively. The experiments were undertaken through different values of in-control parameters vector, \({{\varvec{\theta}}}_{0}\), to alter the shape of the curve.

Table 6 The performance results of proposed estimator for different \(n\), while \({\theta }_{02}\) changes to \({\theta }_{02}+{\delta }_{2}\), \(\tau =50\),\(\lambda =0.2\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)
Fig. 2
figure 2

The performance results of proposed estimator for different \({{\varvec{\theta}}}_{0}\), while \({\theta }_{01}\) changes to \({\theta }_{01}+{\delta }_{1}\) and \({\theta }_{02}\) changes to \({\theta }_{02}+{\delta }_{2}\), independently, \(\tau =50\),\(\lambda =0.2\) and \(n=300\)

It is obvious that to the extent that the change point estimation is closer to 50, the performance of the proposed estimator is better. Tables 3 and 6 demonstrate that the estimator has better performance in large shifts compared to small shits, in all experiments with different \({{\varvec{\theta}}}_{0}\) values and changed parameters, \({\theta }_{01}\) and \({\theta }_{02}\). Also, the model's ascending and concave behaviour, which is changed by \({\theta }_{02}\) parameter, has a direct relationship to the accuracy of the estimator, so that the estimator will perform better in all change sizes by increasing the slope (positive) and concavity.

Tables 7, 8 and Fig. 3 indicates the results of the investigation of the sample size factor by shifting \({\theta }_{01}\) and \({\theta }_{02}\), respectively. As the Tables 3 and 6, by increasing the magnitude of the shifts, the estimator performs better, regardless of the sample size. Also, the results show that as the sample size increase, the estimator's performance improves, which the improvement is more sensible in small and medium shift.

Table 7 The performance results of proposed estimator for different\(\lambda \), while \({\theta }_{01}\) changes to\({\theta }_{01}+{\delta }_{1}\),\(\tau =50\),,\(n=300\) and\({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)
Table 8 The performance results of proposed estimator for different \(\lambda \), while \({\theta }_{02}\) changes to \({\theta }_{02}+{\delta }_{2}\), \(\tau =50\),\(n=300\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)
Fig. 3
figure 3

The performance results of proposed estimator for different \(n\), while \({\theta }_{01}\) changes to \({\theta }_{01}+{\delta }_{1}\) and \({\theta }_{02}\) changes to \({\theta }_{02}+{\delta }_{2}\), independently, \(\tau =50\),\(\lambda =0.2\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)

As shown in Tables 9, 4 and Fig. 4 the ANN method estimates the change point accurately for large shifts regardless of the values of \(\lambda \). For each value of \(\lambda \), the performance of the ANN method is different and the relation between the value of \(\lambda \) and the performance of ANN for small shifts remains inconclusive. The value of \(\lambda \) should be selected by trial and error for the ANN method to have the best performance.

Table 9 Estimated precision performances of ANN estimator, when,\(\tau =50\),\(n=300\), \(\lambda =0.2\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\), \(simulations trials =\mathrm{1,000}\)
Fig. 4
figure 4

The performance results of proposed estimator for different \(\lambda \), while \({\theta }_{01}\) changes to \({\theta }_{01}+{\delta }_{1}\) and \({\theta }_{02}\) changes to \({\theta }_{02}+{\delta }_{2}\), independently, \(\tau =50\),\(n=300\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)

According to these results, the proposed ANN method estimates the change point with high accuracy for large shifts. Also, estimating the change point before the real time of change (i.e., 50) is a feature of the estimator that occurs in large experiments. Finally, the proposed ANN method in this article could be compared with the MLE method introduced by Ghazizadeh et al. (2018) under the same conditions. That is, based on Eq. (20), we set \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\), sample size equal to 300, \(\tau =50\), and \(\lambda =0.2\) for the ANN method. The process for finding \(\mathrm{E}(\widehat{\tau })\) is as satisfactory as in the case of MLE method. The results of different magnitudes of shift (\({\delta }_{1}\) and \({\delta }_{2}\) for \({\theta }_{01}\) and \({\theta }_{02}\)) are shown in Figs. 5 and 6, respectively, and Table 10.

Fig. 5
figure 5

The performance of ANN- and MLE-method when \({\theta }_{01}\) shifts to \({\theta }_{01}+{\delta }_{1}\), \(\tau =50\), \(\lambda =0.2\), \(n=300\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)

Fig. 6
figure 6

The performance of ANN- and MLE-method when \({\theta }_{02}\) shifts to \({\theta }_{02}+{\delta }_{2}\), \(\tau =50\), \(\lambda =0.2\), \(n=300\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)

Table 10 The performance of ANN- and MLE-method when \({\theta }_{01}\) shifts to \({\theta }_{01}+{\delta }_{1}\) and \({\theta }_{02}\) shifts to \({\theta }_{02}+{\delta }_{2}\), \(\tau =50\), \(\lambda =0.2\), \(n=300\) and \({{\varvec{\theta}}}_{0}=[\mathrm{0.5,2}]\)

According to these results, the two methods estimate the change point with high accuracy for large shifts, but in the case of small shifts, the ANN method slightly outperforms the MLE method. Table 10 indicates that the ANN method estimates the change point (\(\tau =50\)) in lower deviation (4.58 vs. 6.22 for a shift in \({\theta }_{01}\) and 5.71 vs. 6.34 for a shift in \({\theta }_{02}\)) and less distance to 50 for a shift in \({\theta }_{01}\) (47.17 vs. 53.16), in comparison with the MLE method. Also, the results establish that the ANN method can estimate the time of shift before the actual time of change, while the MLE method does it after this time.

8 Conclusions

This study presented an artificial neural network (ANN) estimator for step change-point problem in phase II of nonlinear profiles. All types of nonlinear profiles could be investigated by this estimator. Of course, the kind of model under consideration will influence the performance of the proposed method, because the estimator needs to appraise the involved parameters. Besides, the accuracy of the methods which are deployed for this purpose (Gauss–Newton in the present article) is also affected by model type. However, the model in Eq. (8) was selected to evaluate the ANN estimator, because it was a well-known exponential model related to a large nonlinear family of profiles in the literature.

Model type, the shape of the curve, sample size, and the constant parameter, \(\lambda \), used in the ANN method, were four significant factors that can affect the accuracy of the estimator. The proposed estimator's performance was evaluated in terms of these factors, except for model type, through simulation experiments. The results indicated that the change point estimation of the method is reliable, especially in large shifts, large sample sizes, and curves with more ascending and concave patterns. Although the proposed ANN estimator could deal with all types of nonlinear profiles, its application was evaluated with respect to just the selected model, Eq. (8). However, the efficacy of this estimator for other types of profiles and developing it by considering another kind of changes (e.g., multiple step, linear trend, isotonic, and monotonic change) will be assessed in future studies.