Variational Bayesian probabilistic modeling framework for data-driven distributed process monitoring
Graphical abstract
Introduction
Process monitoring has received critical attention in recent years (Gao et al., 2015, Hwang et al., 2010, Jiang and Yan, 2018). With the increasing amount of industrial process data (Qin, 2014, Shang and You, 2019), data-driven process-monitoring methods have received increasing attention (Chen and Ge, 2020, Ding et al., 2009, Ge et al., 2017, Jiang et al., 2018, Qin, 2012, Yin et al., 2014). Currently, large-scale and multiunit are remarkable characteristics of plant-wide processes. The overall operation state of the process and the influence of local unit status and inter unit association are important. Therefore, the issue of large scale multiunit process monitoring methods has received critical attention (Jiang and Huang, 2016, Jiang et al., 2015, Li and Zhao, 2019, Rashidi et al., 2017, Shang et al., 2019). When monitoring a multiunit process, characterizing the variable relationship within each local unit and variable relationship among units are important.
Distributed monitoring method analyzes a process through decomposing the process into multiple subblocks to reduce complexity and monitoring the state of separated units (Ge, 2017, Ge and Song, 2013). Distributed monitoring models based on traditional multivariate statistical process monitoring (MSPM) methods including principal component analysis (PCA), partial least squares (PLS), and canonical correlation analysis (CCA) have been well discussed in various situations (Chen et al., 2016, Liu, Qin et al., 2014, Qin et al., 2001, Westerhuis et al., 2015). These distributed process monitoring methods have provided the basic framework of large-scale process monitoring. In literature (Jiang, Ding, Wang et al., 2017), a CCA-based distributed monitoring method is proposed. The method examines the status of each unit by correlation with neighboring units. Optimal process monitoring residuals are generated, and the monitoring performance is improved. The CCA-based distributed monitoring has been extended intensively to address more complex process behavior. However, these CCA-based methods rely heavily on data covariance, which can be easily corrupted by missing data or outliers.
Probabilistic PCA (PPCA), first proposed by Bishop (2006) and Tipping and Bishop (2010), derives PCA within a density estimation framework. PPCA uses expectation–maximization (EM) algorithm to estimate maximum likelihood parameters, which can describe the process under probabilistic frameworks. PPCA provides a more reliable performance and higher computing efficiency in process monitoring than traditional MSPM methods (Ge and Song, 2010a, Kim and Lee, 2003). However, the research on distributed process monitoring under probabilistic framework has been an under explored domain. Moreover, an overfitting problem exists in PPCA during the EM algorithm step (Bishop, 2006), which may cause low accuracy of process monitoring model. Recently, process monitoring using a generalized probabilistic linear latent variable model has received attention (Raveendran, Kodamana, & Huang, 2018), and such model can be naturally applied in distributed process monitor. Variational Bayesian PCA (VBPCA) applies variational Bayesian inference into parameter estimation, avoiding the overfitting problem and acquiring more precise density estimation framework of processes.
Process monitoring framework based on VBPCA is constructed in Liu, Pan, Sun et al. (2014), showing robustness to missing values in processes. More recently, soft sensing application of variational Bayesian method is introduced in Ma and Huang (2017), coping with dynamic process modeling. In recent decades, data-driven distributed monitoring models have been widely developed. A method based on multiblock PLS and PCA is developed in Wold, Kettaneh, and Tjessem (1996). A distributed PCA monitoring model is introduced by Ge and Song (2013); it divides variables into subblocks on the basis of loading vectors, thereby reducing complexity and extracting local process behavior. A neighborhood VBPCA–CCA-based method for distributed monitoring has been proposed by Jiang, Yan, and Huang (2019); it uses VBPCA to handle missing values and CCA to analyze variable correlation. The CCA-based distributed monitoring model is performed in a deterministic approach, and large-scale industrial processes are usually contaminated by noises or data outliers. The probabilistic modeling technique, such as VBPCA, is robust to the uncertainty of process data. To our best knowledge, distributed monitoring based on the VBPCA has not been discussed thus far.
Furthermore, relying only on VBPCA cannot easily analyze the correlation between units. Variational Bayesian linear regression (VBR) is considered as an effective tool to obtain correlations of data (Bishop, 2006). VBR, similar to the VBPCA, has the advantage of avoiding overfitting problems during parameter estimation, owing to the variational Bayesian inference technique. Therefore, VBR is applied to the proposed model in this work as an effective probabilistic regression model.
A variational Bayesian probabilistic latent variable (VBPLV) model for distributed process monitoring of multiunit processes is proposed. The proposed VBPLV model is an extension of the traditional distributed monitoring under probabilistic condition. The improved probabilistic form of the proposed monitoring model shows robustness to process noises and data outliers, and the variational Bayesian technique enhances reliability of the proposed model. The status of each unit is examined in three subspaces, which provide more judgment information for determining fault type. Three applications including a numerical model, the well-known Tennessee Eastman (TE) process (Downs & Vogel, 1993), and a laboratory distillation process are provided. The effectiveness of the proposed method is tested and verified.
The proposed VBPLV model has its limitations on process characteristics. Literature (Ge and Song, 2010b, Jiang and Yan, 2018) discussed process monitoring problems in the case of nonlinear processes. Zhao, Chen, and Jing (2020) discussed data analytic and monitoring issues when the process data show wide range nonstationary property. Zhao developed a dynamic distributed monitoring strategy for processes with large-scale and nonstationary properties in Zhao and Sun (2018). However, more complex data characters, such as nonlinear, non-Gaussian, and nonstationary, are out of the scope of the present study. Therefore, in the remainder of this paper, all models and processes are assumed as linear, Gaussian, and stationary.
The remainder of this article is organized as follows. In Section 2, the basic definition of VBPCA-based monitoring is reviewed briefly. In Section 3, the variational Bayesian probabilistic modeling framework for distributed process monitoring is proposed. Then, the detailed procedures of offline modeling and online monitoring are listed. In Section 4, the performance of VBPLV-based distributed process monitoring model is evaluated through three applications. The conclusions and discussions are presented in Section 5.
Section snippets
Preliminaries
In this section, we provide a brief review of VBPCA-based monitoring method (Liu, Pan et al., 2014) to build a foundation for model construction.
Proposed VBPLV model
For large-scale multiunit process monitoring, the operation status of the entire process as well as the operation status of a critical local unit, and correlation between different units show great importance (Ge, 2017, Jiang and Huang, 2016). Each unit has its own structure or function and is related to other units, influencing each other and operating in coordination. A local fault may cause chain reaction of multiple devices or units, causing the deterioration of the entire process and even
Application in a numerical model process
A numerical model is constructed as follows: where where is a column vector generated by the standard Gaussian distribution, and is a noise vector generated by Gaussian distribution. The model is divided into two units, namely, unit 1 and unit 2 . 300 samples are collected under normal operating conditions to establish the monitoring models.
To test and analyze the performance
Conclusion and discussion
A VBPLV-based distributed process monitoring model has been constructed to deal with the large-scale multiunit process monitoring problem. First, the proposed model extracts the characteristics of the local unit and correlation between local and neighboring units in probabilistic latent space. Second, monitoring statistics are built to determine the status of the multiunit process. Through variational Bayesian inference treatment, the problem of overfitting in probabilistic latent variable
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by National Natural Science Foundation of China under Grant 61973119 and in part by Shanghai Rising-Star Program, China under Grant 20QA1402600.
References (42)
- et al.
Canonical correlation analysis-based fault detection methods with application to alumina evaporation process
Control Engineering Practice
(2016) - et al.
Robust Bayesian networks for low-quality data modeling and process monitoring applications
Control Engineering Practice
(2020) - et al.
Subspace method aided data-driven design of fault detection and isolation systems
Journal of Process Control
(2009) - et al.
A plant-wide industrial process control problem
Computers & Chemical Engineering
(1993) Review on data-driven modeling and monitoring for plant-wide industrial processes
Chemometrics and Intelligent Laboratory Systems
(2017)- et al.
Distributed monitoring for large-scale processes based on multivariate statistical analysis and Bayesian method
Journal of Process Control
(2016) - et al.
Parallel PCA–KPCA for nonlinear process monitoring
Control Engineering Practice
(2018) - et al.
Data-driven individual–joint learning framework for nonlinear process monitoring
Control Engineering Practice
(2020) - et al.
Process monitoring based on probabilistic PCA
Chemometrics and Intelligent Laboratory Systems
(2003) - et al.
Hybrid fault characteristics decomposition based probabilistic distributed fault diagnosis for large-scale industrial processes
Control Engineering Practice
(2019)
Robust monitoring of industrial processes using process data with outliers and missing values
Chemometrics and Intelligent Laboratory Systems
Survey on data-driven industrial process monitoring and diagnosis
Annual Reviews in Control
Process monitoring using a generalized probabilistic linear latent variable model
Automatica
Generalized grouped contributions for hierarchical fault diagnosis with group lasso
Control Engineering Practice
Data analytics and machine learning for smart process manufacturing: recent advances and perspectives in the big data era
Engineering
Pattern recognition and machine learning (Information science and statistics)
Fault detection and diagnosis in industrial systems
Variational Bayesian inference for linear and logistic regression
A survey of fault diagnosis and fault-tolerant techniques—Part II: Fault diagnosis with knowledge-based and hybrid/active approaches
IEEE Transactions on Industrial Electronics
Process data analytics via probabilistic latent variable models: A tutorial review
Industrial and Engineering Chemistry Research
Mixture Bayesian regularization method of PPCA for multimode process monitoring
AIChE Journal
Cited by (17)
Phase partition and online monitoring for batch processes based on Harris hawks optimization
2023, Control Engineering PracticeDecentralized plant-wide monitoring based on mutual information-Louvain decomposition and support vector data description diagnosis
2023, ISA TransactionsCitation Excerpt :Although the decentralized mode has proven its effectiveness, the monitoring performance may not be ideal if the process variables are not properly decomposed. To effectively decompose the large-scale process, Jiang et al. [17] decomposed the process variables into five sub-blocks based on mechanistic knowledge, and then the distributed variational Bayesian probabilistic model constructed for process monitoring. Xu et al. [18] used the minimum redundancy maximum correlation (mRMR) method for variable decomposition.
Fault monitoring for chemical processes using neighborhood embedding discriminative analysis
2022, Process Safety and Environmental ProtectionCitation Excerpt :With the increasing scale of modern chemical processes, acquiring corresponding first-principle models with certain accuracy becomes quite challenging. Fortunately, the wider utilization of advanced sensors as well as computer-aided systems makes abundant samples from chemical processes to be easily accessed, which then provides a solid platform for promoting data-driven fault monitoring methodologies (Jiang and Jiang, 2021; Yuan et al., 2021a; Deng et al., 2022). Particularly, unsupervised learning approaches concerning feature extraction is quite welcomed for online monitoring anomalies or faults in chemical processes (Cao et al., 2021; Li et al., 2021; Xiao et al., 2021).
Similarity and sparsity collaborative embedding and its application to robust process monitoring
2022, Control Engineering PracticeCitation Excerpt :Process monitoring is essential for modern industrial processes due to the high demands of product quality and operation safety (Ding & Li, 2021; Qin, 2003). Early detection of abnormal events can prevent potential harm to manufacturing equipment and reduce economic consequences (Chen & Ge, 2020; Ge, Song, & Gao, 2013; Jiang & Jiang, 2020). In the “Big data era”, the rapid development of advanced sensor technology has enabled the collection and storage of large amounts of measured variables such as temperature, flow rate, and pressure (Ji, He, Shang, & Zhou, 2017; Kano & Nakagawa, 2008; Yin, Li, Gao, & Kaynak, 2014; Zhao & Zhao, 2020).
A Novel Distributed Process Monitoring Framework of VAE-Enhanced with Deep Neural Network
2024, Neural Processing LettersA data-driven distributed process monitoring method for industry manufacturing systems
2024, Transactions of the Institute of Measurement and Control