A batch-wise LSTM-encoder decoder network for batch process monitoring

doi:10.1016/j.cherd.2020.09.019

Chemical Engineering Research and Design

Volume 164, December 2020, Pages 102-112

https://doi.org/10.1016/j.cherd.2020.09.019 Get rights and content

Highlights

•
Proposed a multi-layer recurrent neural network in the encoder–decoder structure and the corresponding monitoring method for batch process monitoring.
•
The proposed network can simultaneously capture the between and within batch dynamic features in the nonlinear batch process.
•
The proposed model outperforms the AutoEncoder and Sub-PCA model in fault detection.

Abstract

Process monitoring is essential to keep quality consistency and operation safety in the batch process. However, the existence of multiphase, nonlinearity and dynamic features in the batch process makes the batch process monitoring a complicated task. In this work, a multi-layer recurrent neural network in the encoder–decoder structure called batch-wise LSTM-encoder decoder network is proposed to solve the difficulties mentioned above in batch process monitoring. The LSTM-encoder extracts the nonlinear dynamic features in both between and within batch direction, then projects the high dimensional input space to a low dimensional hidden state space. The decoder part regenerates the samples from hidden states. Control statistics H² and SPE are designed for process monitoring, and the corresponding control limits are estimated by kernel density estimation. A case study on an extensive reference penicillin fermentation dataset suggests that the proposed method can detect the fault samples more effectively than previous methods while keeping the same robustness in normal conditions.

Introduction

Batch processes play a significant role in modern industries such as pharmaceutical, semiconductor and biotechnology (Wang et al., 2019). It is defined as a type of process that repeatably produces a finite quantity of products once a time following a sequence of process steps (Mehta and Reddy, 2015). Because of the flexibility in production, it is important to keep quality consistency and operation safety in the batch process. Process monitoring, or in other words, online fault detection, is vital to meet the demand of quality and safety in the batch process, so it has become a hot field in the past decades (Qin, 2012).

Owing to the abundant sensors linking to the network in the modern batch process, statistical process control (SPC) methods have been widely applied to monitor and control the process (Yao and Gao, 2009). Univariate control chart is a feasible and popular method to monitor the process. However, it suffers from “blind spots” in a multivariate setting, which would loosen the control limits (Wang et al., 2017). In order to overcome these difficulties, many multivariate process monitoring methods have been developed. Compared to the continuous processes, the dynamic features within and between batches make the monitoring of the batch process more complex and challenging. In all the methods suitable for batch processes, multiway principal component analysis (MPCA) is the most widely used one (Nomikos and MacGregor, 1994). The multiway methods firstly unfold the three-dimensional (Batch × Time × Variable) historical batch process data cube to a two-dimensional matrix, such as batch-wise unfolding, Batch × (Time × Variable) and variable-wise unfolding, Variable × (Time × Batch). Then, the PCA method is applied to these two-dimensional matrices to project the high-dimensional data into a low-dimensional space and the statistics such as Hotelling's T² and square prediction error(SPE) are adopted to measure the main space and residual information.

However, MPCA methods are typical linear methods which may fail to extract the nonlinear information in the batch process. In order to improve the nonlinear process monitoring efficiency, many modified methods based on MPCA are proposed in the past decades. Among them, multiphase methods (Lu et al., 2004, Zhao and Sun, 2013) and kernel methods (Lee et al., 2004, Jia et al., 2010) are two mainstream methods. Multiphase methods are similar to piecewise linearization. They first divide the batch into several phases containing relatively similar dynamic characteristics. Then local MPCA models are built for each phase separately. The ability of extracting nonlinear information is improved by reducing nonlinearity in one phase (Lu et al., 2004). Kernel PCA uses kernel functions to map data from low-dimensional input space to high-dimensional feature space, then apply PCA in the high-dimensional feature space (Lee et al., 2004).

Nowadays, artificial neural networks (ANN) are getting more attention for the ability of effectively extracting nonlinear features (Adamowski et al., 2012). Particularly, autoencoders (AEs) and recurrent neural networks(RNN) are two widely studied networks in the field of process monitoring (Yan et al., 2016, Yu and Zhao, 2019, Shahnazari, 2020, Zhang et al., 2019, Loy-Benitez et al., 2020). AEs is a type of multi-layer neural network with a low-dimensional central layer and is effective in dimension reducing and variance capturing in the nonlinear continuous processes according to (Yan et al., 2016, Yu and Zhao, 2019). In Yan et al. (2016), variant autoencoders including denoising autoencoders and contractive autoencoders are applied to the Tennessee Eastman process (TE process) and monitors the process by statistics H². In Yu and Zhao (2019), denoising autoencoders along with elastic net is proposed to generate a sparse and robust network. Statistics A², SPE and C are used to monitor the main hidden space, residual information, and overall measurement in the continuous processes. Considering the nature of time sequences, RNN is extensively studied for the ability of capturing sequential and dynamic information in the continuous process (Shahnazari, 2020). In Shahnazari (2020), a bank of RNNs is used to build the predictive model from the input sequence, and process monitoring is done by comparing the predictions and realistic samples. Moreover, the networks combining AEs and RNN are carried out in the field of continuous process monitoring to utilize the ability of dimension reducing and sequential information capturing (Zhang et al., 2019, Loy-Benitez et al., 2020). In Zhang et al. (2019), UNet based model, which is a variant of AEs with LSTM linking the corresponding encoder and decoder layer, is proposed to extract the nonlinear features along with time sequence information and is applied to monitor a power plant. In Loy-Benitez et al. (2020), a multi-input multi-output RNN encoder called MG-RNN-AE is designed to reduce dimension and reconstruct the original input sequence, and process monitoring is realized by monitoring the statistics H² and SPE.

However, all the above mentioned neural networks can only extract the nonlinear feature between samples. As a characteristic of the batch process, the dynamic features not only exist between samples within a batch but also between batches. This concern indicates that some specific designs could be carried out to improve the monitoring performance in the nonlinear batch process. In Jiang et al. (2020), the three-way batch process data cube is firstly unfolded to a two-dimensional matrix. An AE model is constructed to extract the relations between variables. Finally the dynamic features between batches are captured by canonical correlation analysis (CCA) of 2-D matrices, including samples from different batches. In this way, the synthetic hidden state of AEs and CCA is more suitable to monitor the batch processes.

In order to address the above concerns, a batch-wise LSTM-encoder decoder network, along with the monitoring scheme for the nonlinear batch process, is proposed. The main contribution of the proposed method is that the LSTM-encoder can capture the between and within batch dynamic features in the nonlinear batch process simultaneously by sliding through the batch-wise input sequence. Specifically, the batch-wise LSTM-encoder is a multi-layer recurrent neural network with a shrinking layer dimension. The decoder part is a symmetric multi-layer neural network. The input sequence of LSTM-encoder is a series of kth samples from different batches. The encoder slides through these input sequences to obtain the between batch features and shrinks layer by layer to obtain the within batch features. After encoding, the last hidden state of the input sequence is passed to the decoder to reconstruct the input sample. Batch process monitoring is achieved by monitoring the proposed H² and SPE statistics. A case study of an extensive reference data set associated with the PenSim benchmark model proposed by Van Impe and Gins (2015) shows an outstanding process-monitoring performance, especially for the barely detectable faults comparing to the AEs and MPCA monitoring model.

Section snippets

Multi-layer RNN

Recurrent neural networks (RNN) is a kind of neural network used for variable sequences (X = (x₁, x₂, …, x_t)) modeling. Distinguished from other neural networks, RNN consists of a hidden state h and a feedback loop of hidden states to update the current hidden state, which introduces the information from previous timestamps to the current state. As shown in Fig. 1, RNN could be either a single-layer or multi-layer structure. For a J layers RNN, the current hidden state $h_{t}^{j}$ of tth time stamp and j

Batch data description

Measured process data in the multiphase batch process are usually stored in the form of a three-dimensional cube, $X \in ℝ^{I \times J \times K}$ , recording K measured points with J process variables of all the I batches, as shown in Fig. 2. It should be noted that the lengths of batches are unfixed, which is not represented in Fig. 2. Nevertheless, the unfixed batch length problem will be considered in this study, and the sample rates of all the variables are considered equal in this study. Time slice, $x_{k} \in ℝ^{I_{k} \times J}$ is

Case study

The PenSim benchmark model (Birol et al., 2002) is chosen as the test case in this study for its popularity in the field of batch process monitoring studies (Wang et al., 2019, Wang et al., 2018). It describes the production of penicillin in a fed-batch reactor, as shown in Fig. 5. Instead of using the simulation dataset created by ourselves, a public extensive reference dataset (Van Impe and Gins, 2015) based on the PenSim benchmark model is used in this study. In this reference dataset, the

Conclusion

In this paper, a batch-wise LSTM-encoder decoder network that is suitable to extract the nonlinear dynamic features in the batch process and the corresponding process monitoring method is proposed. The LSTM-encoder is a multi-layer dimension reducing recurrent neural network. The input sequences are a series of samples from different batches. Based on the extracted hidden states and regenerated samples, the statistics H² and SPE are designed to measure the deviations in hidden state space and

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgments

The authors would like to acknowledge the financial support from National Science Foundation China grant no. U1609213.

References (30)

G. Birol et al.
A modular simulation package for fed-batch fermentation: penicillin production
Comput. Chem. Eng.
(2002)
M. Jia et al.
On-line batch process monitoring using batch dynamic kernel principal component analysis
Chemometr. Intell. Lab. Syst.
(2010)
J.-M. Lee et al.
Fault detection of batch processes using multiway kernel principal component analysis
Comput. Chem. Eng.
(2004)
J. Loy-Benitez et al.
Soft sensor validation for monitoring and resilient control of sequential subway indoor air quality through memory-gated recurrent neural networks-based autoencoders
Control Eng. Pract.
(2020)
B.R. Mehta et al.
OPC Communications
(2015)
R.F. Nielsen et al.
Hybrid machine learning assisted modelling framework for particle processes
Comput. Chem. Eng.
(2020)
S.J. Qin
Survey on data-driven industrial process monitoring and diagnosis
Annu. Rev. Control
(2012)
H. Shahnazari
Fault diagnosis of nonlinear systems using recurrent neural networks
Chem. Eng. Res. Des.
(2020)
J. Van Impe et al.
An extensive reference dataset for fault detection and identification in batch processes
Chemometr. Intell. Lab. Syst.
(2015)
R. Wang et al.
A geometric method for batch data visualization, process monitoring and fault detection
J. Process Control
(2018)

W. Yan et al.

Nonlinear and robust statistical process monitoring based on variant autoencoders

Chemometr. Intell. Lab. Syst.

(2016)

Y. Yao et al.

A survey on multistage/multiphase statistical modeling methods for batch processes

Annu. Rev. Control

(2009)

S. Yin et al.

A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process

J. Process Control

(2012)

C. Zhao et al.

Step-wise sequential phase partition (sspp) algorithm based statistical modeling and online process monitoring

Chemometr. Intell. Lab. Syst.

(2013)

J. Adamowski et al.

Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada

Water Resour. Res.

(2012)

Cited by (32)

Variational autoencoder based on knowledge sharing and correlation weighting for process-quality concurrent fault detection
2024, Engineering Applications of Artificial Intelligence
The severity of faults and the corresponding solutions in the complex and large-scale modern manufacturing industry are determined based on their impact on product quality. Detecting faults that affect product quality is crucial for minimizing downtime and reducing maintenance costs. However, obtaining real-time indicators of product quality is often challenging. Therefore, in practical production settings, it becomes essential to develop methods for detecting process-quality faults without relying on real-time quality data. Existing process-quality concurrent fault detection methods establish process-variable-quality-variable relationships to design monitoring statistics. Variational autoencoder (VAE) is extensively utilized to disentangle quality-related information from quality-unrelated information by ensuring the independence of the extracted features. However, the current VAE-based model structures lack theoretical support from probability models, and the feature extraction solely from process variables may not fully capture quality indicators. Therefore, we propose a novel VAE based on knowledge sharing and correlation weighting (KSCW-VAE) with theoretical interpretability to realize process-quality concurrent fault detection. First, to extract information from quality variables, a novel VAE model with two input branches is constructed. The EncoderNet branch includes both process variables and quality variables, while the PriorNet branch comprises only process variables. Then, Kullback–Leibler (KL) divergence constraint is introduced to facilitate effective extraction of combined information from process variables and quality variables by leveraging knowledge sharing between EncoderNet and PriorNet. To differentiate quality-related and quality-unrelated information, separate regression networks for process variables and quality variables are constructed by segregating latent variables. In addition, a network weight regularization based on canonical correlation analysis (CCA) is designed to reduce the importance of weak quality-related process variables (WQRPVs). Finally, a monitoring procedure and two statistics are developed for implementing process-quality concurrent fault detection. Extensive experiments are conducted to demonstrate the superiority of the proposed algorithm compared to advanced alternatives.Compared to the baseline model, the proposed method demonstrates an average accuracy improvement of 16.4% to 1% and 32.1% to 1.225% in two respective cases.
Fault diagnosis based on counterfactual inference for the batch fermentation process
2024, ISA Transactions
Fault diagnosis plays a pivotal role in identifying the root causes of a fault. Current fault diagnosis methods encounter the shortcomings being unable to assess the fault amplitude or having low efficiency for batch fermentation process. In order to solve the above problems, this paper proposes a fault detection model named convolutional neural network based on variational autoencoder (CNN-VAE) and a fault diagnosis based on counterfactual inference (FDCI). To begin with, quality-related process variables are selected using mutual information (MI). Next, a two-dimensional moving window is used to obtain input sequences from the process data. Then, two statistics from the latent and residual domains of the CNN-VAE model are constructed for fault detection. Additionally, once a fault occurs, FDCI is used to locate the root cause of a fault. Finally, a simulation process and a real-world L. plantarum batch fermentation process are provided to demonstrate the effectiveness of the proposed approache.
Case study of Block 1, Highland Towers Condominium collapsed in Taman Hillview, Ulu Klang, Selangor, Malaysia on 11th December 1993
2023, Alexandria Engineering Journal
Landslide has become the primary focus in slope engineering as the case increases. Most landslides are resulted from an unknown confluence, and most occur on slopes created by humanity. Highland Towers were built between 1976 and 1979, and its occupants were primarily first and middle-class people. There is a steep hill behind the three blocks. The main draws here are the natural scenery and panoramic vista of Kuala Lumpur. This case study investigates what factors led to Block 1, Highland Towers condominium’s collapse on 11th December 1993 at 1.35 p.m. using reliability analysis approaches and human fault impact factors. The Block 1 collapse is caused by an unstable pile foundation. The engineers incorrectly estimated the horizontal load design, causing surcharge loads to be created by the downhill forward movement when the rotating retrogressive slide happens. However, several precautions may be taken to protect the building structures from landslide. It is common knowledge that structural reliability analysis yields failure probabilities that are not considered for human faults. Inadequate drainage, collapse of retaining wall, and rail piling foundation are all plausible explanations for this landslide initiation. Thus, combining structural and human reliability analysis are recommended to mitigate landslide, including slope hazards.
LSTM-based framework with metaheuristic optimizer for manufacturing process monitoring
2023, Alexandria Engineering Journal
Quick process shift detection and lower out-of-control run length are essential for monitoring the production process, especially in modern smart manufacturing. Specifically, the out-of-control run length is one of the most critical performance measures to evaluate the manufacturing process monitoring (MPM) model. The sooner the out-of-control is detected, the better the model is. However, developing a monitoring model which can provide quick shift detection for various data dimensions and volumes is challenging. In this research, single (1_LSTM) and stacked (S_LSTM) long-short-term memory (LSTM) based models with metaheuristic optimizer were proposed to detect process shifts quickly in the manufacturing domain. Based on the literature, three metaheuristic methods: Clustering-based organism search (CSOS), Particle Swarm Optimization (PSO), and Simulated Annealing (SA) that are suitable for high-dimensional optimization were utilized in the proposed method to optimize weights in the LSTM-based network. The proposed models were evaluated based on average out-of-control run length ( $A R L_{1}$ ) against benchmark methods on various synthesized multivariate normal and real-world datasets. Also, the performances of CSOS, PSO, and SA were compared. The results show that CSOS_S_LSTM outperforms other methods with lower $A R L_{1}$ . The result also confirmed the effectiveness and applicability of the proposed models for real-world problems. The experimental results showed that the response time of detection can be improved by 33.19% and 38.77% on average using the proposed 1_LSTM and CSOS LSTM-based metaheuristics models, respectively.
Error-triggered spatial model predictive control for oven temperature
2023, Chemical Engineering Research and Design
An error-triggered spatial model predictive control (MPC) strategy is present for the parabolic partial differential equation (PDE) system, utilizing input/output data. The initial model is first constructed, in which the spatial basis functions (SBFs) are obtained through Karhunen-Loève expansion, and the temporal models are trained by the neural network. Due to the time-varying nature of the PDE system and the poor control performance during model updates, an error-triggered mechanism is developed to update the SBFs. Besides, a temporal sliding window is designed to update the temporal models, which can strike a balance between capturing the latest dynamics and maintaining temporal model accuracy. Based on the aforementioned model, a spatial MPC incorporating spatial distribution information is proposed. Theoretical analysis affirms the stability of the proposed controller. To demonstrate the superiority of this method, experiments during the thermal process in an oven are carried out. Compared with the methods with fixed update step size, the proposed method has the smoother actuator response and better control performance.
Applying and dissecting LSTM neural networks and regularized learning for dynamic inferential modeling
2023, Computers and Chemical Engineering
Deep learning models such as the long short-term memory (LSTM) network have been applied for dynamic inferential modeling. However, many studies apply LSTM as a black-box approach without examining the necessity and usefulness of the internal LSTM gates for inferential modeling. In this paper, we use LSTM as a state space realization and compare it with linear state space modeling and statistical learning methods, including N4SID, partial least squares, the Lasso, and support vector regression. Two case studies on an industrial 660 MW boiler and a debutanizer column process indicate that LSTM underperforms all other methods. LSTM is shown to be capable of outperforming linear methods for a simulated reactor process with severely excited nonlinearity in the data. By dissecting the sub-components of a simple LSTM model, the effectiveness of the LSTM gates and nonlinear activation functions is scrutinized.

View all citing articles on Scopus

View full text

A batch-wise LSTM-encoder decoder network for batch process monitoring

Highlights

Abstract

Introduction

Section snippets

Multi-layer RNN

Batch data description

Case study

Conclusion

Declaration of Competing Interest

Acknowledgments

Comput. Chem. Eng.

Chemometr. Intell. Lab. Syst.

Comput. Chem. Eng.

Control Eng. Pract.

Comput. Chem. Eng.

Annu. Rev. Control

Chem. Eng. Res. Des.

Chemometr. Intell. Lab. Syst.

J. Process Control

Chemometr. Intell. Lab. Syst.

Annu. Rev. Control

J. Process Control

Chemometr. Intell. Lab. Syst.

Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada

Water Resour. Res.