Variational Bayesian probabilistic modeling framework for data-driven distributed process monitoring

doi:10.1016/j.conengprac.2021.104778

Control Engineering Practice

Volume 110, May 2021, 104778

https://doi.org/10.1016/j.conengprac.2021.104778 Get rights and content

Highlights

•
A variational Bayesian probabilistic modeling framework for data-driven distributed monitoring is proposed.
•
Variable relationships within an operation unit and among units are characterized.
•
Both the process status and the type of a detected fault are identified.
•
Monitoring effectiveness is demonstrated through three application examples.

Abstract

Data-driven process monitoring has gained increasing attention because of the increasing demand in process safety and the rapid advancement of data gathering techniques. When monitoring a plant-wide multiunit process, establishing a monitor for each unit individually ignores the correlations among units, whereas establishing a global monitor for the entire process ignores the local process behavior. A variational Bayesian-based probabilistic modeling approach is proposed for efficient distributed process monitoring. A novel probabilistic latent variable model is developed to characterize the variable relationship in each local unit and among units. First, variational Bayesian-based latent variable extraction is performed in each local unit, through which variable relationship within a local unit is characterized. Second, variational Bayesian-based regression model is established between the latent variables and neighboring variables, through which the variable relationship among units is characterized. Then, modeling residuals and monitoring statistics are generated, through which the process status and the type of a detected fault are identified. The effectiveness of the proposed probabilistic modeling and monitoring method is verified by three case studies, including a numerical example, the Tennessee Eastman benchmark process, and a laboratory distillation process.

Graphical abstract

Introduction

Process monitoring has received critical attention in recent years (Gao et al., 2015, Hwang et al., 2010, Jiang and Yan, 2018). With the increasing amount of industrial process data (Qin, 2014, Shang and You, 2019), data-driven process-monitoring methods have received increasing attention (Chen and Ge, 2020, Ding et al., 2009, Ge et al., 2017, Jiang et al., 2018, Qin, 2012, Yin et al., 2014). Currently, large-scale and multiunit are remarkable characteristics of plant-wide processes. The overall operation state of the process and the influence of local unit status and inter unit association are important. Therefore, the issue of large scale multiunit process monitoring methods has received critical attention (Jiang and Huang, 2016, Jiang et al., 2015, Li and Zhao, 2019, Rashidi et al., 2017, Shang et al., 2019). When monitoring a multiunit process, characterizing the variable relationship within each local unit and variable relationship among units are important.

Distributed monitoring method analyzes a process through decomposing the process into multiple subblocks to reduce complexity and monitoring the state of separated units (Ge, 2017, Ge and Song, 2013). Distributed monitoring models based on traditional multivariate statistical process monitoring (MSPM) methods including principal component analysis (PCA), partial least squares (PLS), and canonical correlation analysis (CCA) have been well discussed in various situations (Chen et al., 2016, Liu, Qin et al., 2014, Qin et al., 2001, Westerhuis et al., 2015). These distributed process monitoring methods have provided the basic framework of large-scale process monitoring. In literature (Jiang, Ding, Wang et al., 2017), a CCA-based distributed monitoring method is proposed. The method examines the status of each unit by correlation with neighboring units. Optimal process monitoring residuals are generated, and the monitoring performance is improved. The CCA-based distributed monitoring has been extended intensively to address more complex process behavior. However, these CCA-based methods rely heavily on data covariance, which can be easily corrupted by missing data or outliers.

Probabilistic PCA (PPCA), first proposed by Bishop (2006) and Tipping and Bishop (2010), derives PCA within a density estimation framework. PPCA uses expectation–maximization (EM) algorithm to estimate maximum likelihood parameters, which can describe the process under probabilistic frameworks. PPCA provides a more reliable performance and higher computing efficiency in process monitoring than traditional MSPM methods (Ge and Song, 2010a, Kim and Lee, 2003). However, the research on distributed process monitoring under probabilistic framework has been an under explored domain. Moreover, an overfitting problem exists in PPCA during the EM algorithm step (Bishop, 2006), which may cause low accuracy of process monitoring model. Recently, process monitoring using a generalized probabilistic linear latent variable model has received attention (Raveendran, Kodamana, & Huang, 2018), and such model can be naturally applied in distributed process monitor. Variational Bayesian PCA (VBPCA) applies variational Bayesian inference into parameter estimation, avoiding the overfitting problem and acquiring more precise density estimation framework of processes.

Process monitoring framework based on VBPCA is constructed in Liu, Pan, Sun et al. (2014), showing robustness to missing values in processes. More recently, soft sensing application of variational Bayesian method is introduced in Ma and Huang (2017), coping with dynamic process modeling. In recent decades, data-driven distributed monitoring models have been widely developed. A method based on multiblock PLS and PCA is developed in Wold, Kettaneh, and Tjessem (1996). A distributed PCA monitoring model is introduced by Ge and Song (2013); it divides variables into subblocks on the basis of loading vectors, thereby reducing complexity and extracting local process behavior. A neighborhood VBPCA–CCA-based method for distributed monitoring has been proposed by Jiang, Yan, and Huang (2019); it uses VBPCA to handle missing values and CCA to analyze variable correlation. The CCA-based distributed monitoring model is performed in a deterministic approach, and large-scale industrial processes are usually contaminated by noises or data outliers. The probabilistic modeling technique, such as VBPCA, is robust to the uncertainty of process data. To our best knowledge, distributed monitoring based on the VBPCA has not been discussed thus far.

Furthermore, relying only on VBPCA cannot easily analyze the correlation between units. Variational Bayesian linear regression (VBR) is considered as an effective tool to obtain correlations of data (Bishop, 2006). VBR, similar to the VBPCA, has the advantage of avoiding overfitting problems during parameter estimation, owing to the variational Bayesian inference technique. Therefore, VBR is applied to the proposed model in this work as an effective probabilistic regression model.

A variational Bayesian probabilistic latent variable (VBPLV) model for distributed process monitoring of multiunit processes is proposed. The proposed VBPLV model is an extension of the traditional distributed monitoring under probabilistic condition. The improved probabilistic form of the proposed monitoring model shows robustness to process noises and data outliers, and the variational Bayesian technique enhances reliability of the proposed model. The status of each unit is examined in three subspaces, which provide more judgment information for determining fault type. Three applications including a numerical model, the well-known Tennessee Eastman (TE) process (Downs & Vogel, 1993), and a laboratory distillation process are provided. The effectiveness of the proposed method is tested and verified.

The proposed VBPLV model has its limitations on process characteristics. Literature (Ge and Song, 2010b, Jiang and Yan, 2018) discussed process monitoring problems in the case of nonlinear processes. Zhao, Chen, and Jing (2020) discussed data analytic and monitoring issues when the process data show wide range nonstationary property. Zhao developed a dynamic distributed monitoring strategy for processes with large-scale and nonstationary properties in Zhao and Sun (2018). However, more complex data characters, such as nonlinear, non-Gaussian, and nonstationary, are out of the scope of the present study. Therefore, in the remainder of this paper, all models and processes are assumed as linear, Gaussian, and stationary.

The remainder of this article is organized as follows. In Section 2, the basic definition of VBPCA-based monitoring is reviewed briefly. In Section 3, the variational Bayesian probabilistic modeling framework for distributed process monitoring is proposed. Then, the detailed procedures of offline modeling and online monitoring are listed. In Section 4, the performance of VBPLV-based distributed process monitoring model is evaluated through three applications. The conclusions and discussions are presented in Section 5.

Section snippets

Preliminaries

In this section, we provide a brief review of VBPCA-based monitoring method (Liu, Pan et al., 2014) to build a foundation for model construction.

Proposed VBPLV model

For large-scale multiunit process monitoring, the operation status of the entire process as well as the operation status of a critical local unit, and correlation between different units show great importance (Ge, 2017, Jiang and Huang, 2016). Each unit has its own structure or function and is related to other units, influencing each other and operating in coordination. A local fault may cause chain reaction of multiple devices or units, causing the deterioration of the entire process and even

Application in a numerical model process

A numerical model is constructed as follows: $x = Cs + ε$ where $\begin{matrix} x = {[x_{1}, \dots, x_{6}]}^{T} \in R^{6 \times 1} \\ C = [\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 0.8 & 0.6 & 0 & 0 & 0 \\ 0 & 0.4 & 0.6 & 0 & 0 \\ 0 & 0 & 0.4 & 0.6 & 0 \\ 0 & 0 & 0 & 0.6 & 0.8 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}] \\ s = {[s_{1}, \dots, s_{5}]}^{T} \in R^{5 \times 1} \end{matrix}$ where $s$ is a column vector generated by the standard Gaussian distribution, and $ε$ is a noise vector generated by Gaussian distribution. The model is divided into two units, namely, unit 1 $u_{1} = {[x_{1}, x_{2}, x_{3}]}^{T}$ and unit 2 $u_{2} = {[x_{4}, x_{5}, x_{6}]}^{T}$ . 300 samples are collected under normal operating conditions to establish the monitoring models.

To test and analyze the performance

Conclusion and discussion

A VBPLV-based distributed process monitoring model has been constructed to deal with the large-scale multiunit process monitoring problem. First, the proposed model extracts the characteristics of the local unit and correlation between local and neighboring units in probabilistic latent space. Second, monitoring statistics are built to determine the status of the multiunit process. Through variational Bayesian inference treatment, the problem of overfitting in probabilistic latent variable

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under Grant 61973119 and in part by Shanghai Rising-Star Program, China under Grant 20QA1402600.

References (42)

ChenZ. et al.
Canonical correlation analysis-based fault detection methods with application to alumina evaporation process
Control Engineering Practice
(2016)
ChenG. et al.
Robust Bayesian networks for low-quality data modeling and process monitoring applications
Control Engineering Practice
(2020)
DingS.X. et al.
Subspace method aided data-driven design of fault detection and isolation systems
Journal of Process Control
(2009)
DownsJ.J. et al.
A plant-wide industrial process control problem
Computers & Chemical Engineering
(1993)
GeZ.
Review on data-driven modeling and monitoring for plant-wide industrial processes
Chemometrics and Intelligent Laboratory Systems
(2017)
JiangQ. et al.
Distributed monitoring for large-scale processes based on multivariate statistical analysis and Bayesian method
Journal of Process Control
(2016)
JiangQ. et al.
Parallel PCA–KPCA for nonlinear process monitoring
Control Engineering Practice
(2018)
JiangQ. et al.
Data-driven individual–joint learning framework for nonlinear process monitoring
Control Engineering Practice
(2020)
KimD. et al.
Process monitoring based on probabilistic PCA
Chemometrics and Intelligent Laboratory Systems
(2003)
LiW. et al.
Hybrid fault characteristics decomposition based probabilistic distributed fault diagnosis for large-scale industrial processes
Control Engineering Practice
(2019)

LuoL. et al.

Robust monitoring of industrial processes using process data with outliers and missing values

Chemometrics and Intelligent Laboratory Systems

(2019)

QinS.J.

Survey on data-driven industrial process monitoring and diagnosis

Annual Reviews in Control

(2012)

RaveendranR. et al.

Process monitoring using a generalized probabilistic linear latent variable model

Automatica

(2018)

ShangC. et al.

Generalized grouped contributions for hierarchical fault diagnosis with group lasso

Control Engineering Practice

(2019)

ShangC. et al.

Data analytics and machine learning for smart process manufacturing: recent advances and perspectives in the big data era

Engineering

(2019)

BishopC.M.

Pattern recognition and machine learning (Information science and statistics)

(2006)

ChiangL.H. et al.

Fault detection and diagnosis in industrial systems

(2001)

DrugowitschJ.

Variational Bayesian inference for linear and logistic regression

(2013)

GaoZ. et al.

A survey of fault diagnosis and fault-tolerant techniques—Part II: Fault diagnosis with knowledge-based and hybrid/active approaches

IEEE Transactions on Industrial Electronics

(2015)

GeZ.

Process data analytics via probabilistic latent variable models: A tutorial review

Industrial and Engineering Chemistry Research

(2018)

GeZ. et al.

Mixture Bayesian regularization method of PPCA for multimode process monitoring

AIChE Journal

(2010)

Cited by (17)

Phase partition and online monitoring for batch processes based on Harris hawks optimization
2023, Control Engineering Practice
Most industrial batch processes exhibit significantly different characteristics at different manufacturing steps, and it is advantageous to partition batch processes reasonably and establish phase models separately for online monitoring. In the present work, a novel phase partition method is developed based on Harris hawks optimization (HHO) with hard sequentiality constraint, which seeks for the optimal phase partition results under the specific target phase number in inner loop and automatically determines the optimal phase number in outer loop by making a trade-off between modeling complexity and partition performance. First, a new definition of the sum of quadratic error (SQE) is designed as the fitness function to evaluate the within-phase compactness, which makes the time-slice matrices with similar process variable correlations stay in the same phase. Then, an optimization refinement scheme (ORS) is developed to find the potential local minimum of the SQE by inspecting and reallocating the phase-adjacent samples. Afterwards, the percentage of performance improvement indicator (PPII) is proposed to determine the optimal phase numbers by quantifying the improvement of minimum SQE. For each subphase, canonical variate analysis (CVA) is performed to build statistical models for dynamic process monitoring. The effectiveness of the proposed method is illustrated by a numerical example with some outliers and an injection molding process.
Decentralized plant-wide monitoring based on mutual information-Louvain decomposition and support vector data description diagnosis
2023, ISA Transactions
Citation Excerpt :
Although the decentralized mode has proven its effectiveness, the monitoring performance may not be ideal if the process variables are not properly decomposed. To effectively decompose the large-scale process, Jiang et al. [17] decomposed the process variables into five sub-blocks based on mechanistic knowledge, and then the distributed variational Bayesian probabilistic model constructed for process monitoring. Xu et al. [18] used the minimum redundancy maximum correlation (mRMR) method for variable decomposition.
A decentralized fault detection and diagnosis method is proposed to monitor the nonlinear plant-wide processes effectively. It includes two theme activities: mutual information-Louvain based process decomposition and support vector data descriptions (SVDD) based fault diagnosis. Firstly, the plant-wide process is preliminarily map as an undirected graph corresponding to the mechanism knowledge and process structure. Mutual information (MI) is introduced to depict the correlation degree between different nodes (i.e., process variables), and a Louvain algorithm with MI correlation is proposed to fine decompose the process into reasonable sub-blocks. Then, decentralized SVDD based fault detection method is presented for each sub-block, and the corresponding variable contribution rate is derived. Finally, a Bayesian fusion inference is given to evaluate the detection results of all sub-blocks in an integrated manner. The proposed method is verified in the Tennessee-Eastman (TE) process.
Fault monitoring for chemical processes using neighborhood embedding discriminative analysis
2022, Process Safety and Environmental Protection
Citation Excerpt :
With the increasing scale of modern chemical processes, acquiring corresponding first-principle models with certain accuracy becomes quite challenging. Fortunately, the wider utilization of advanced sensors as well as computer-aided systems makes abundant samples from chemical processes to be easily accessed, which then provides a solid platform for promoting data-driven fault monitoring methodologies (Jiang and Jiang, 2021; Yuan et al., 2021a; Deng et al., 2022). Particularly, unsupervised learning approaches concerning feature extraction is quite welcomed for online monitoring anomalies or faults in chemical processes (Cao et al., 2021; Li et al., 2021; Xiao et al., 2021).
The importance of chemical process safety and the availability of abundant samples keep popularizing the wider application of data-driven fault monitoring techniques. With a goal of efficiently discovering the inconsistency between the online monitored sample and the normal samples, a novel fault monitoring algorithm called neighborhood embedding discriminative analysis (NEDA) is proposed, which can adaptively provide different latent feature generating mechanisms for different monitored samples so that the inherited inconsistency could be uncovered in a timely manner. Instead of extracting representative features from a dataset only given from the normal operating condition, the objective function designed for the NEDA algorithm additionally takes the online monitoring sample into account, and then timely generates but only one projecting vector to point out the specific inconsistency for the corresponding monitored sample. The NEDA algorithm aims to figure out a discriminative projection so that the neighborhood embedding error (NEE) corresponding to the online monitored sample could be maximized, while the NEE associated with the normal samples is minimized. Furthermore, the corresponding NEE for the monitored sample of current interest is employed as the indicator for fault monitoring purposes. As demonstrated through comparisons, the salient performance achieved by the proposed NEDA-based fault monitoring method in monitoring static as well as dynamic processes can be always guaranteed.
Similarity and sparsity collaborative embedding and its application to robust process monitoring
2022, Control Engineering Practice
Citation Excerpt :
Process monitoring is essential for modern industrial processes due to the high demands of product quality and operation safety (Ding & Li, 2021; Qin, 2003). Early detection of abnormal events can prevent potential harm to manufacturing equipment and reduce economic consequences (Chen & Ge, 2020; Ge, Song, & Gao, 2013; Jiang & Jiang, 2020). In the “Big data era”, the rapid development of advanced sensor technology has enabled the collection and storage of large amounts of measured variables such as temperature, flow rate, and pressure (Ji, He, Shang, & Zhou, 2017; Kano & Nakagawa, 2008; Yin, Li, Gao, & Kaynak, 2014; Zhao & Zhao, 2020).
Multivariate measurements in complex industrial processes are commonly contaminated by a number of outliers. In this context, robustness to the corrupted data is an important problem in process monitoring tasks. This paper proposed a novel approach called similarity and sparsity collaborative embedding (SSCE) for efficient robust process monitoring. The proposed SSCE can learn a sparse coefficient matrix by a $ℓ_{1}$ -norm regularization as the sparse constraint on the reconstruction errors, making it robust to the data contaminated by outliers. The similarity preserving matrix is proposed to capture the local structure of the given data, and then the local information is transferred to the sparse coefficients such that the similarity among data points can be preserved. In this way, the reduced-dimensional representations extracted is capable of containing more informative and discriminating characteristics of the original data, which is beneficial to enhance monitoring performance. Meanwhile, projection learning is integrated into the proposed objective function to learn an explicit projection matrix in an overall optimum way, which enables the SSCE to circumvent the out-of-sample problem and facilitate subsequent process monitoring tasks. Two case studies on a simulated typical chemical process and a practical fractionation process demonstrate the effectiveness of the proposed approach.
A Novel Distributed Process Monitoring Framework of VAE-Enhanced with Deep Neural Network
2024, Neural Processing Letters
A data-driven distributed process monitoring method for industry manufacturing systems
2024, Transactions of the Institute of Measurement and Control

View all citing articles on Scopus

View full text

Variational Bayesian probabilistic modeling framework for data-driven distributed process monitoring

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Preliminaries

Proposed VBPLV model

Application in a numerical model process

Conclusion and discussion

Declaration of Competing Interest

Acknowledgments

Control Engineering Practice

Control Engineering Practice

Journal of Process Control

Computers & Chemical Engineering

Chemometrics and Intelligent Laboratory Systems

Journal of Process Control

Control Engineering Practice

Control Engineering Practice

Chemometrics and Intelligent Laboratory Systems

Control Engineering Practice

Chemometrics and Intelligent Laboratory Systems

Annual Reviews in Control

Automatica

Control Engineering Practice

Engineering

Pattern recognition and machine learning (Information science and statistics)

Fault detection and diagnosis in industrial systems

Variational Bayesian inference for linear and logistic regression

A survey of fault diagnosis and fault-tolerant techniques—Part II: Fault diagnosis with knowledge-based and hybrid/active approaches

IEEE Transactions on Industrial Electronics

Process data analytics via probabilistic latent variable models: A tutorial review

Industrial and Engineering Chemistry Research

Mixture Bayesian regularization method of PPCA for multimode process monitoring

AIChE Journal