Variational Bayesian probabilistic modeling framework for data-driven distributed process monitoring

https://doi.org/10.1016/j.conengprac.2021.104778Get rights and content

Highlights

  • A variational Bayesian probabilistic modeling framework for data-driven distributed monitoring is proposed.

  • Variable relationships within an operation unit and among units are characterized.

  • Both the process status and the type of a detected fault are identified.

  • Monitoring effectiveness is demonstrated through three application examples.

Abstract

Data-driven process monitoring has gained increasing attention because of the increasing demand in process safety and the rapid advancement of data gathering techniques. When monitoring a plant-wide multiunit process, establishing a monitor for each unit individually ignores the correlations among units, whereas establishing a global monitor for the entire process ignores the local process behavior. A variational Bayesian-based probabilistic modeling approach is proposed for efficient distributed process monitoring. A novel probabilistic latent variable model is developed to characterize the variable relationship in each local unit and among units. First, variational Bayesian-based latent variable extraction is performed in each local unit, through which variable relationship within a local unit is characterized. Second, variational Bayesian-based regression model is established between the latent variables and neighboring variables, through which the variable relationship among units is characterized. Then, modeling residuals and monitoring statistics are generated, through which the process status and the type of a detected fault are identified. The effectiveness of the proposed probabilistic modeling and monitoring method is verified by three case studies, including a numerical example, the Tennessee Eastman benchmark process, and a laboratory distillation process.

Introduction

Process monitoring has received critical attention in recent years (Gao et al., 2015, Hwang et al., 2010, Jiang and Yan, 2018). With the increasing amount of industrial process data (Qin, 2014, Shang and You, 2019), data-driven process-monitoring methods have received increasing attention (Chen and Ge, 2020, Ding et al., 2009, Ge et al., 2017, Jiang et al., 2018, Qin, 2012, Yin et al., 2014). Currently, large-scale and multiunit are remarkable characteristics of plant-wide processes. The overall operation state of the process and the influence of local unit status and inter unit association are important. Therefore, the issue of large scale multiunit process monitoring methods has received critical attention (Jiang and Huang, 2016, Jiang et al., 2015, Li and Zhao, 2019, Rashidi et al., 2017, Shang et al., 2019). When monitoring a multiunit process, characterizing the variable relationship within each local unit and variable relationship among units are important.

Distributed monitoring method analyzes a process through decomposing the process into multiple subblocks to reduce complexity and monitoring the state of separated units (Ge, 2017, Ge and Song, 2013). Distributed monitoring models based on traditional multivariate statistical process monitoring (MSPM) methods including principal component analysis (PCA), partial least squares (PLS), and canonical correlation analysis (CCA) have been well discussed in various situations (Chen et al., 2016, Liu, Qin et al., 2014, Qin et al., 2001, Westerhuis et al., 2015). These distributed process monitoring methods have provided the basic framework of large-scale process monitoring. In literature (Jiang, Ding, Wang et al., 2017), a CCA-based distributed monitoring method is proposed. The method examines the status of each unit by correlation with neighboring units. Optimal process monitoring residuals are generated, and the monitoring performance is improved. The CCA-based distributed monitoring has been extended intensively to address more complex process behavior. However, these CCA-based methods rely heavily on data covariance, which can be easily corrupted by missing data or outliers.

Probabilistic PCA (PPCA), first proposed by Bishop (2006) and Tipping and Bishop (2010), derives PCA within a density estimation framework. PPCA uses expectation–maximization (EM) algorithm to estimate maximum likelihood parameters, which can describe the process under probabilistic frameworks. PPCA provides a more reliable performance and higher computing efficiency in process monitoring than traditional MSPM methods (Ge and Song, 2010a, Kim and Lee, 2003). However, the research on distributed process monitoring under probabilistic framework has been an under explored domain. Moreover, an overfitting problem exists in PPCA during the EM algorithm step (Bishop, 2006), which may cause low accuracy of process monitoring model. Recently, process monitoring using a generalized probabilistic linear latent variable model has received attention (Raveendran, Kodamana, & Huang, 2018), and such model can be naturally applied in distributed process monitor. Variational Bayesian PCA (VBPCA) applies variational Bayesian inference into parameter estimation, avoiding the overfitting problem and acquiring more precise density estimation framework of processes.

Process monitoring framework based on VBPCA is constructed in Liu, Pan, Sun et al. (2014), showing robustness to missing values in processes. More recently, soft sensing application of variational Bayesian method is introduced in Ma and Huang (2017), coping with dynamic process modeling. In recent decades, data-driven distributed monitoring models have been widely developed. A method based on multiblock PLS and PCA is developed in Wold, Kettaneh, and Tjessem (1996). A distributed PCA monitoring model is introduced by Ge and Song (2013); it divides variables into subblocks on the basis of loading vectors, thereby reducing complexity and extracting local process behavior. A neighborhood VBPCA–CCA-based method for distributed monitoring has been proposed by Jiang, Yan, and Huang (2019); it uses VBPCA to handle missing values and CCA to analyze variable correlation. The CCA-based distributed monitoring model is performed in a deterministic approach, and large-scale industrial processes are usually contaminated by noises or data outliers. The probabilistic modeling technique, such as VBPCA, is robust to the uncertainty of process data. To our best knowledge, distributed monitoring based on the VBPCA has not been discussed thus far.

Furthermore, relying only on VBPCA cannot easily analyze the correlation between units. Variational Bayesian linear regression (VBR) is considered as an effective tool to obtain correlations of data (Bishop, 2006). VBR, similar to the VBPCA, has the advantage of avoiding overfitting problems during parameter estimation, owing to the variational Bayesian inference technique. Therefore, VBR is applied to the proposed model in this work as an effective probabilistic regression model.

A variational Bayesian probabilistic latent variable (VBPLV) model for distributed process monitoring of multiunit processes is proposed. The proposed VBPLV model is an extension of the traditional distributed monitoring under probabilistic condition. The improved probabilistic form of the proposed monitoring model shows robustness to process noises and data outliers, and the variational Bayesian technique enhances reliability of the proposed model. The status of each unit is examined in three subspaces, which provide more judgment information for determining fault type. Three applications including a numerical model, the well-known Tennessee Eastman (TE) process (Downs & Vogel, 1993), and a laboratory distillation process are provided. The effectiveness of the proposed method is tested and verified.

The proposed VBPLV model has its limitations on process characteristics. Literature (Ge and Song, 2010b, Jiang and Yan, 2018) discussed process monitoring problems in the case of nonlinear processes. Zhao, Chen, and Jing (2020) discussed data analytic and monitoring issues when the process data show wide range nonstationary property. Zhao developed a dynamic distributed monitoring strategy for processes with large-scale and nonstationary properties in Zhao and Sun (2018). However, more complex data characters, such as nonlinear, non-Gaussian, and nonstationary, are out of the scope of the present study. Therefore, in the remainder of this paper, all models and processes are assumed as linear, Gaussian, and stationary.

The remainder of this article is organized as follows. In Section 2, the basic definition of VBPCA-based monitoring is reviewed briefly. In Section 3, the variational Bayesian probabilistic modeling framework for distributed process monitoring is proposed. Then, the detailed procedures of offline modeling and online monitoring are listed. In Section 4, the performance of VBPLV-based distributed process monitoring model is evaluated through three applications. The conclusions and discussions are presented in Section 5.

Section snippets

Preliminaries

In this section, we provide a brief review of VBPCA-based monitoring method (Liu, Pan et al., 2014) to build a foundation for model construction.

Proposed VBPLV model

For large-scale multiunit process monitoring, the operation status of the entire process as well as the operation status of a critical local unit, and correlation between different units show great importance (Ge, 2017, Jiang and Huang, 2016). Each unit has its own structure or function and is related to other units, influencing each other and operating in coordination. A local fault may cause chain reaction of multiple devices or units, causing the deterioration of the entire process and even

Application in a numerical model process

A numerical model is constructed as follows: x=Cs+εwhere x=[x1,,x6]TR6×1C=100000.80.600000.40.600000.40.600000.60.800001s=[s1,,s5]TR5×1where s is a column vector generated by the standard Gaussian distribution, and ε is a noise vector generated by Gaussian distribution. The model is divided into two units, namely, unit 1 u1=[x1,x2,x3]T and unit 2 u2=[x4,x5,x6]T. 300 samples are collected under normal operating conditions to establish the monitoring models.

To test and analyze the performance

Conclusion and discussion

A VBPLV-based distributed process monitoring model has been constructed to deal with the large-scale multiunit process monitoring problem. First, the proposed model extracts the characteristics of the local unit and correlation between local and neighboring units in probabilistic latent space. Second, monitoring statistics are built to determine the status of the multiunit process. Through variational Bayesian inference treatment, the problem of overfitting in probabilistic latent variable

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under Grant 61973119 and in part by Shanghai Rising-Star Program, China under Grant 20QA1402600.

References (42)

  • LuoL. et al.

    Robust monitoring of industrial processes using process data with outliers and missing values

    Chemometrics and Intelligent Laboratory Systems

    (2019)
  • QinS.J.

    Survey on data-driven industrial process monitoring and diagnosis

    Annual Reviews in Control

    (2012)
  • RaveendranR. et al.

    Process monitoring using a generalized probabilistic linear latent variable model

    Automatica

    (2018)
  • ShangC. et al.

    Generalized grouped contributions for hierarchical fault diagnosis with group lasso

    Control Engineering Practice

    (2019)
  • ShangC. et al.

    Data analytics and machine learning for smart process manufacturing: recent advances and perspectives in the big data era

    Engineering

    (2019)
  • BishopC.M.

    Pattern recognition and machine learning (Information science and statistics)

    (2006)
  • ChiangL.H. et al.

    Fault detection and diagnosis in industrial systems

    (2001)
  • DrugowitschJ.

    Variational Bayesian inference for linear and logistic regression

    (2013)
  • GaoZ. et al.

    A survey of fault diagnosis and fault-tolerant techniques—Part II: Fault diagnosis with knowledge-based and hybrid/active approaches

    IEEE Transactions on Industrial Electronics

    (2015)
  • GeZ.

    Process data analytics via probabilistic latent variable models: A tutorial review

    Industrial and Engineering Chemistry Research

    (2018)
  • GeZ. et al.

    Mixture Bayesian regularization method of PPCA for multimode process monitoring

    AIChE Journal

    (2010)
  • Cited by (17)

    • Decentralized plant-wide monitoring based on mutual information-Louvain decomposition and support vector data description diagnosis

      2023, ISA Transactions
      Citation Excerpt :

      Although the decentralized mode has proven its effectiveness, the monitoring performance may not be ideal if the process variables are not properly decomposed. To effectively decompose the large-scale process, Jiang et al. [17] decomposed the process variables into five sub-blocks based on mechanistic knowledge, and then the distributed variational Bayesian probabilistic model constructed for process monitoring. Xu et al. [18] used the minimum redundancy maximum correlation (mRMR) method for variable decomposition.

    • Fault monitoring for chemical processes using neighborhood embedding discriminative analysis

      2022, Process Safety and Environmental Protection
      Citation Excerpt :

      With the increasing scale of modern chemical processes, acquiring corresponding first-principle models with certain accuracy becomes quite challenging. Fortunately, the wider utilization of advanced sensors as well as computer-aided systems makes abundant samples from chemical processes to be easily accessed, which then provides a solid platform for promoting data-driven fault monitoring methodologies (Jiang and Jiang, 2021; Yuan et al., 2021a; Deng et al., 2022). Particularly, unsupervised learning approaches concerning feature extraction is quite welcomed for online monitoring anomalies or faults in chemical processes (Cao et al., 2021; Li et al., 2021; Xiao et al., 2021).

    • Similarity and sparsity collaborative embedding and its application to robust process monitoring

      2022, Control Engineering Practice
      Citation Excerpt :

      Process monitoring is essential for modern industrial processes due to the high demands of product quality and operation safety (Ding & Li, 2021; Qin, 2003). Early detection of abnormal events can prevent potential harm to manufacturing equipment and reduce economic consequences (Chen & Ge, 2020; Ge, Song, & Gao, 2013; Jiang & Jiang, 2020). In the “Big data era”, the rapid development of advanced sensor technology has enabled the collection and storage of large amounts of measured variables such as temperature, flow rate, and pressure (Ji, He, Shang, & Zhou, 2017; Kano & Nakagawa, 2008; Yin, Li, Gao, & Kaynak, 2014; Zhao & Zhao, 2020).

    • A data-driven distributed process monitoring method for industry manufacturing systems

      2024, Transactions of the Institute of Measurement and Control
    View all citing articles on Scopus
    View full text