Dynamic statistical process monitoring based on generalized canonical variate analysis

https://doi.org/10.1016/j.jtice.2020.07.007Get rights and content

Highlights

  • A novel generalized canonical variate analysis (GCVA) algorithm is formulated.

  • The GCVA can explicitly extract dynamic and static latent variables from time-serial data.

  • Comparisons have demonstrated the superiority and effectiveness of the GCVA-based method.

Abstract

A novel generalized canonical variate analysis (GCVA) algorithm is formulated and then applied for data-driven dynamic process monitoring. The proposed GCVA algorithm seeks for different projecting bases for the time-serial samples, so that the sum of squared canonical correlation coefficients between all pairs of the projected latent variables could be maximized. The corresponding dynamic process monitoring scheme first utilizes GCVA to explicitly extract dynamic and static latent variables from the time-serial data, simultaneously. Second, a multivariate regression model is employed for describing the time-serial relationship between the dynamic latent variables, the model residual then services as a good indicator for the inconsistency in the defined time-serial mechanism. For online monitoring purposes, two combined monitoring indices are proposed for detecting abnormalities in the time-dependent and time-independent variations, respectively. Additionally, reconstruction-based contribution indices are also derived for fault diagnosis accordingly. Finally, the capability of the GCVA algorithm in exploiting the time-serial correlation inherited in the given data is demonstrated, the effectiveness and superiority of the proposed GCVA-based approach over other counterparts are validated as well, through comparisons on two dynamic industrial processes.

Introduction

The importance of ensuring health operation of industrial processes keeps inducing the need to design efficient monitoring systems for trustfully fault detection and diagnosis. Nowadays, with the growing complexity and wider application of computer-aided devices in modern plants, the availability of massive process data has been witnessing the popularity of data-driven process monitoring approaches for decades [1], [2], [3]. Generally, the essence of implementing data-driven process monitoring is the development of a model that characterizes the normal signature of process data sampled from the normal operating condition. Faults are then defined as a deviation from this normality above a threshold. As such, there are many multivariate analytical algorithms, like principal component analysis (PCA), can be applied in process monitoring [4], [5], [6], [7]. Different analytical algorithms explore information of different latent variables to the fault detection as well as fault diagnosis.

Given that the measurements in modern industrial plants could be highly time dependent, the time-serial correlated characteristic (or auto-correlation) inherited in the given data is required to be taken into account. To tackle this sort of dynamic process monitoring issue, Ku et al [8] pioneered to augment each sample with a number of previous measured samples before the PCA algorithm is performed, a dynamic PCA (DPCA) model was then resulted for dynamic process monitoring. Through utilizing the same augmenting strategy, different dynamic process monitoring methods involving different analytic algorithms have been proposed in the literature [9, 10]. The canonical variate analysis (CVA) also called canonical correlation analysis elsewhere, provides an alternative for modeling time-serial data as well [11], [12], [13]. The CVA algorithm represents the time-serial correlated characteristic by constructing state variables from the past samples to explain the future variabilities. Moreover, Choi et al. [14] and Kerkhof et al. [15] investigated the feasibility of the multivariate autoregressive (AR) model in modeling the time-serial correlation in the bath processes.

Furthermore, Miao et al [16] proposed a novel dynamic process monitoring approach based on time neighborhood preserving embedding (TNPE) model. The TNPE reconstructs each sample from its time-serial neighbors instead of distance neighbors, the consideration of time-serial relationship of a data manifold in dimensionality reduction can also uncover dynamic latent variables for dynamic process monitoring. Recently, Li et al. [17] developed a dynamic latent variable (DLV) model through maximizing the variance of a weighted sum of lagged latent variables. The resulted DLV model can extract auto-correlated latent variables and statistically time-independent latent variables, sequentially. Similarly, Dong and Qin [18] formulated a dynamic-inner PCA (DiPCA) algorithm for extracting dynamic latent variables with maximal auto-covariance.

There are some other types of dynamic process monitoring approaches available in the literature. For example, identifying state-space models has also been found to be functional in modeling and monitoring dynamic processes [19,20]. With the utilization of kernel functions, the aforementioned methods could be extended to handle the nonlinearity in the given data [21,22]. Once a fault has been detected, the task of fault diagnosis is then activated. An examination of the existing literature on fault diagnosis shows that contribution plots are typically employed to isolate the source variables associated with the fault. An alternative to the classic contribution plots, referred to as reconstruction-based contribution (RBC) plots, has been proposed by Alcala and Qin [23]. The RBC calculates the contribution of each monitored variable in concert with the monitoring statistics, it can thus provide more accurate fault diagnosis results in contrast to the classic contribution plots [23].

Generally, the extraction of latent variables representing the time-serial correlated characteristic inherited in the given data should be orientated towards maximal canonical correlation. In comparison with the canonical correlation coefficients that considered in the CVA algorithm, the consideration of maximal variance or auto-covariance can only reflect partial auto-correlated variation since the collinear and/or systematic variation can also dominate a latent variable that satisfies the maximal variance or auto-covariance. From this viewpoint, maximizing the canonical correlation coefficients between every single pair of latent variables is the appropriate way for auto-correlated feature extraction. Motivated by this recognition, a generalized CVA (GCVA) algorithm is proposed for time-serial data modeling and dynamic process monitoring purposes. The GCVA seeks for different projecting bases for time-serial samples so that the squared canonical correlation coefficients of all the possible pairs of latent variables are maximized.

The proposed GCVA-based dynamic process monitoring scheme first extracts dynamic latent variables that dominate the time-serial correlated variation, as well as static latent variables that are time-independent. A multivariate regression model is then employed for describing the time-serial relationship of the dynamic latent variables, the corresponding model residual could be a good indicator for the inconsistency in the defined time-serial mechanism. Moreover, the static latent variables representing the time-independent variation would be monitored as well. Therefore, the proposed GCVA-based approach provides an explicit decomposition for the time-serial process data, and uncovers two different types of variations inherited from the given data, i.e., time-dependent and time-independent variations, for process monitoring purposes.

Section snippets

DPCA and CVA

Through augmenting each sample in the training dataset X = [x1,x2,⋅⋅⋅, xn]TRn × m with its previous measured d samples according to the following:X~=[xd+1xd+2xnxdxd+1xn1x1x2xnd]TR(nd)×m(d+1)the auto-correlation in the dataset X is then mixed with the cross-correlation, where xiRm × 1 is the i-th sample with i = 1, 2, ⋅⋅⋅, n, n and m are the numbers of samples and measured variables, respectively. The standard PCA algorithm can then be performed on the augmented matrix X~

GCVA-based dynamic process monitoring

The GCVA algorithm is proposed to uncover the time-serial correlation through projecting the time-serial samples (i.e., xt,xt − 1, ⋅⋅⋅, xt − d) onto corresponding projecting bases (i.e., Wd + 1, Wd,⋅⋅⋅, W1), the formulation of the GCVA algorithm is conceptually displayed in Fig. 1.

A numerical dynamic system

The monitoring task of the following discrete dynamic system is first considered:z(t)=Az(t1)+Bu(t1)+e(t)y(t)=Cz(t)+[0.3200.7490.2630.6890.3200.2850.389000.543]u(t)+v(t)where z(t) ∈ R3 × 1 denotes the state vector at the t-th sampling step, e(t) and v(t) are random noises following Gaussian distribution with zero mean and standard deviation to be 0.2 and 0.5, respectively. The input vector u(t) ∈ R2 × 1 is generated by:u(t)=Du(t1)+[0.1930.6890.3200.749]w(t)where w(t) ∈ R2 × 1 is a

Conclusion

A novel GCVA algorithm with application to fault detection and diagnosis in dynamic processes has been presented. The GCVA is formulated to maximize the sum of squared canonical correlation between every single pair of latent variables that projected from time-serial samples. The capability of exploiting the time-serial correlation inherited in the given data has been demonstrated, the superiority and effectiveness of the proposed GCVA-based dynamic process monitoring scheme over other

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was sponsored by the National Natural Science Foundation of China (61773225), the Natural Science Foundation of Zhejiang Province (LY20F030004), K.C.Wong Magna Fund in Ningbo University, and the Fundamental Research Funds for the Central Universities under Grant 222201817006.

References (32)

Cited by (15)

  • Enhanced dynamic latent variable analysis for dynamic process monitoring

    2024, Journal of the Taiwan Institute of Chemical Engineers
  • A mixture of probabilistic predictable feature analysis for multi-mode dynamic process monitoring

    2023, Journal of the Taiwan Institute of Chemical Engineers
    Citation Excerpt :

    In the past decades, traditional statistical methods including principal component analysis (PCA) [8], partial least squares (PLS) [9,10], canonical correlation analysis (CCA) [11,12], etc., have been favourably applied to complicated industrial processes. However, dynamic relations inhabited inside training samples are neglected in these methods, leading to limited monitoring performance in practical cases [13]. Researchers have reported many extensions of these conventional methods to alleviate the dynamic issue.

  • Adaptive slow feature analysis - sparse autoencoder based fault detection for time-varying processes

    2023, Journal of the Taiwan Institute of Chemical Engineers
    Citation Excerpt :

    The data-driven methods can make full use of a large amount of data and effectively reflect the operation state of the actual industrial process [4]. Therefore, they have been widely studied and applied in practice [5,6]. Multivariate statistical process monitoring (MSPM), as a classical data-driven method, has been applied in many practical industrial process fields [7,8].

  • Two-dimensional multiphase batch process monitoring based on sparse canonical variate analysis

    2022, Journal of Process Control
    Citation Excerpt :

    Dong et al. [25] employed an integration of CVA and Gaussian mixture model (GMM) for flow state monitoring, in which CVA is capable to extract flow state features and then GMM is used to establish model for different flow states. Lan et al. [26] developed a novel generalized canonical variate analysis (GCVA) method for dynamic process monitoring, the goal of which is to search different projecting bases and maximize the sum of squared canonical correlation. Despite the existing researches have demonstrated that CVA-based methods are advantageous for fault detection of dynamic processes, the batch-to-batch correlations are neglected when building monitoring models for batch processes.

View all citing articles on Scopus
View full text