Unsupervised isolation of abnormal process variables using sparse autoencoders

doi:10.1016/j.jprocont.2021.01.005

Journal of Process Control

Volume 99, March 2021, Pages 107-119

https://doi.org/10.1016/j.jprocont.2021.01.005 Get rights and content

Highlights

•
Sparse autoencoders reveal relations between process variables.
•
Abnormal deviations in process variables cause shifts in autoencoder’s residual space.
•
Sparsity enhances residual movements of abnormal process variables.
•
Variables are isolated by backpropagating residual movements through the autoencoder.

Abstract

Isolation of abnormal changes in process variables is an integral component of fault diagnosis, as it provides evidential information for determining the root cause of a detected abnormal event. This task is challenging when the approach to diagnosis does not incorporate knowledge of the process’ nominal behavior, but is instead established solely on historical process data. Though isolation of abnormal changes in variables may be facilitated by including historical process data for faults that have been previously diagnosed, inconclusive results will remain for unfamiliar faults. This paper presents a method for isolating abnormal changes in process variables with an autoencoder (AE) - a type of neural network configured for latent projection — and without prior knowledge of nominal process behavior or faults. The AE is optimized with nominal process data as well as a sparsity constraint to produce a sparse network. Probing into the sparse AE allows one to gain insight into the correlations that exist among the process variables during normal process operation. Movements in the AE’s reconstruction space are interrogated alongside the acquired knowledge to isolate the abnormal changes in process variables. The method is demonstrated with a simulation of a nonlinear triple tank process, and is shown to isolate abnormal changes in variables for both simple and complex faults.

Introduction

Process operators are regularly confronted with the task of proposing an appropriate cause to an abnormal event. Operators diagnose a fault by isolating abnormal changes in the signals of process variables; a probable cause is then assigned given the identified aggregation of signal changes. Complete reliance on operators for signal evaluation becomes difficult as process plants become larger and more complex. An increasing number of observable process variables will lead to information overload, slowing down analysis and risking incorrect diagnosis. Recent advances in data storage technologies promote industry to archive historical process data [1]. The growing availability of historical process data, coupled with increasing process complexity, has lead to an increase in research on methods for multivariate statistical process monitoring that rely solely on data and not on process knowledge.

Methods for multivariate statistical process monitoring can be grouped into two different approaches, namely, statistical fault classification and feature extraction. Fault classification is the problem of identifying to which of a set of faults a new observation (sample) belongs. However, developing an effective classifier requires an abundant number of training observations for every possible fault. Obtaining a sufficient amount of training data proves difficult when faults, regardless of their severity, rarely occur. Feature extraction is the process of deriving numerical quantities intended to be informative about a data set. It is applied to process monitoring by comparing the features of new observations with the features of a training data set that describes the nominal behavior of a process. A fault is detected when the disparity, usually represented by a monitoring statistic, exceeds a certain threshold. Since it is often the case that historical process data contains disproportionately more training samples explaining nominal process behavior, monitoring based on feature extraction is generally favored over fault-classification.

The focus of this paper is on latent projection (LP) - a numerical method for feature extraction. LP reduces the dimension of the process variable space to a set of features that retain information in the original variables. LP uncovers the nominal correlation structure among variables by summarizing correlated variables with a smaller set of principal variables. A model given by LP is used for fault detection by identifying abnormal changes in the correlation structure among process variables.

Venkatasubramanian et al. [2] propose that a successful diagnostic system is a hybrid of three diagnostic components: (a) a data-driven method for quick detection; (b) a trend-based method for assessing abnormal changes (shifts) in process variables; and (c) an expert system that proposes a root cause given the result from trend analysis. In the context of LP, much of the available literature addresses the first diagnostic component. Within the class of linear methods, principal component analysis (PCA) and partial least squares have been successfully applied to linear systems where process data follows the assumption of normality [3], [4], [5]. For nonlinear systems where the normality assumption is not met, independent component analysis, kernel PCA, and neural networks demonstrate superior performance [6], [7], [8]. Ku et al. [9] propose dynamic LP, where the process variable vector is extended with past samples to include dynamic behavior in the LP model. An overview of these methods is provided in [10].

Component wise residual analysis in the form of contribution plots is a well-established approach for assessing abnormal changes in process variables [11], [12]. Contribution plots indicate the contribution of each process variable to the monitoring statistic. If the statistic exceeds its control limit, then the variables exhibiting the largest contributions are investigated. However, LP produces a fault smearing effect such that the signal characteristics of abnormal variables smear onto nominal variables [13]. Identifying a probable cause becomes a challenge since the results from the analysis are ambiguous. Yoon and MacGregor [14] propose a workaround that addresses the fault smearing effect by comparing the normalized contributions with the diagnosed contributions of previous abnormal events. However, the method only applies to abnormal events that have occurred before, and thus fault smearing remains an issue for unfamiliar faults.

Qin and Alcala [5], [13] propose a fault reconstruction-based approach to abnormal trend analysis. Upon detecting an abnormal sample with a LP model, abnormal changes in process variables are isolated by correcting the effect of a fault on the abnormal sample such that the nominal (non-faulty) values are estimated. Since there is no prior knowledge of the detected fault and its fault directions, reconstruction-based methods must carry out a search for the abnormal variables, which becomes computationally expensive for large processes. Combinatorial optimization methods such as the branch and bound algorithm may be integrated into the reconstruction-based method to improve the search efficiency [15].

Results from contribution analysis and reconstruction-based analysis do not provide a root cause to an abnormal event, but rather a list of process variables that are influenced by the abnormal event. A root cause is determined by assessing the abnormal variables with respect to a qualitative understanding of the relationship between process components, process functions, and control architecture [2]; the success of this final diagnostic component is thus dependent on the success of trend analysis at isolating the abnormal shifts in process variables correctly.

Neural networks are known to be potential universal approximators for any nonlinear function [16], [17], [18]. A neural network is configured for nonlinear LP by including a bottleneck network layer that reduces the original variable space to a lower dimension. The network is optimized to (a) learn a nonlinear transformation to the bottleneck layer; and (b) learn a nonlinear transformation that reconstructs the original variables [19]. Such networks, termed autoencoders (AEs), have been proposed for abnormal event diagnosis of nonlinear processes. Fault detection with AEs is shown to outperform other models given by LP methods such as PCA, independent component analysis, and kernel PCA [6], [20], [21], [22], [23], [24], [25]. This result is attributed to the superior nonlinear modeling capacity of AEs. Several methods have been proposed to improve the evaluation of abnormal trends in process variables with an AE. Hallgrímsson et al. [26] augment the optimization of an AE with a sparsity constraint to produce a sparse AE, resulting in a reduction of the fault smearing effect as the contributions from process variables uncorrelated with the faulty variables are eliminated. The effect of sparsity on fault diagnosis is also explored in the works of Yu et al. [27], [28] where process variables affected by a fault are isolated with a sparse discriminant analysis of a sparse AE to offer superior diagnosis over contribution analysis. Sparse AEs have also been used for the detection and localization of anomalies in images; Sabokrou et al. [29] and Touati et al. [30] promote sparsity with a penalty term that encourages network neurons to follow a Bernoulli distribution. Ren et al. [31] propose a reconstruction-based approach with a multilayered AE that learns to estimate a hypothetical faulty variable set with a gradient descent approach.

This paper proposes a contribution analysis-based method for detecting and isolating abnormal changes in process variables, whilst simultaneously remedying the complications induced by the fault smearing effect. The proposed method does not require prior knowledge of faults. Given a historical process data set sampled from when the process was consistent with normal operating conditions, an AE is optimized with a sparsity constraint to produce a sparse network, permitting one to probe into it to gain insight on the correlation structure among the process variables. The condition of the process is evaluated with a monitoring statistic that is sensitive to an abnormal event that changes the correlation among the process variables. Upon detecting an abnormal event, the contributions to the monitoring statistic are interrogated with the sparse AE to determine the direction of the abnormal shifts in process variables that ultimately explain the contributions. Unlike reconstruction-based approaches, the proposed method does not require an optimization problem to be solved. The proposed method is demonstrated with faults occurring in a simulated nonlinear process. The key result of this study was that interrogating the results from contribution analysis with the sparse AE allows for the isolation of abnormal shifts in process variables.

The organization of this paper is as follows. Section 2 reviews the method of LP in the context of PCA and AEs. Section 3 describes how AEs optimized with a sparsity constraint can expose process variable structure. Process monitoring with AEs and the isolation of abnormal process variables is discussed in Section 4. Section 5 presents the results from diagnosing two different faults occurring in a nonlinear process. The last two sections provide a discussion and conclusion, respectively, of the results.

Section snippets

Latent Projection (LP)

LP is a numerical method that transforms a high-dimensional variable space to a smaller set of latent, principal variables that retain essential information about the original variables. The method sets a compromise between the degree of dimensionality reduction and loss of information. LP has seen increased application in process monitoring as large processes consisting of many process variables can be monitored with a smaller number of principal variables. Let $x \in R^{m \times 1}$ represent a vector of

Discovery of process knowledge

The performance of a neural network model is largely determined by its model complexity. Its prediction accuracy is generally improved by increasing the number and size of network layers. However, proceeding in such a direction has a tendency to overfit the network to training data such that it performs poorly on validation and test data. This is undesirable in multivariate statistical fault diagnosis due to the disparity between the training and test sets; both are sampled from the same

Online process monitoring and fault contribution analysis

Online process monitoring consists of referring new variable samples against an “in-control” AE trained with historical data collected when only common cause variation was present in the process. New observations are reconstructed by propagating them through the AE to obtain the residuals $e_{n e w} = x_{n e w} - {\hat{x}}_{n e w}$ . Previously unseen changes in signal characteristics caused by an abnormal event are detected by computing the SPE (otherwise known as the $Q$ monitoring statistic) of the residuals [4]: $S P E = \sum_{i = 1}^{m}$

Case study: The triple tank process

The proposed method for diagnosis is demonstrated with a simulated triple tank process (TTP) in this section. The TTP - a multivariate, nonlinear process — is a variant of the quadruple tank process [42]. A schematic drawing of the TTP is given in Fig. 10. The liquid supplying the upper tanks is transported from a large sump by the means of two gear pumps. Liquid flows out from the bottom of each tank, with the liquid from the upper right tank first supplying the lower tank before returning to

Discussion

Since the LP model in this paper is a static sparse AE, diagnosis performs poorly when process variable samples contain temporal information. Reference changes (which induce dynamic process behavior) cause the SPE in Fig. 14 to exceed its control limit, generating false alarms and hampering diagnosis. The process must reach steady-state to confirm that a false alarm has occurred, visualized by the SPE returning to below the control limit. This behavior is explained by reference to Fig. 20.

Conclusion

This paper proposes a method based on sparse autoencoders (AEs) for detecting and isolating abnormal shifts in process variables. The proposed method does not require prior knowledge of faults. In the proposed method, an AE is optimized to reduce the dimensions of a process variable space with historical process data — data that was sampled from a process that was consistent with normal operating conditions. The AE’s optimization is augmented with naïve elastic net regularization to shrink

CRediT authorship contribution statement

Ásgeir Daniel Hallgrímsson: Conceptualization, Methodology, Software, Validation, Formal analysis, Resources, Data curation, Writing - original draft, Visualization. Hans Henrik Niemann: Conceptualization, Writing - review & editing, Supervision. Morten Lind: Conceptualization, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to acknowledge the support of the Danish Hydrocarbon Research and Technology Center (DHRTC) at the Technical University of Denmark.

References (43)

VenkatasubramanianV. et al.
A review of process fault detection and diagnosis: Part III: Process history based methods
Comput. Chem. Eng.
(2003)
MacGregorJ.F. et al.
Statistical process control of multivariate processes
Control Eng. Pract.
(1995)
LeeJ.M. et al.
Statistical process monitoring with independent component analysis
J. Process Control
(2004)
LeeJ.M.
Nonlinear process monitoring using kernel principal component analysis
Chem. Eng. Sci.
(2004)
KuW. et al.
Disturbance detection and isolation by dynamic principal component analysis
Chemometr. Intell. Lab. Syst.
(1995)
HeQ.P. et al.
Statistical process monitoring as a big data analytics tool for smart manufacturing
J. Process Control
(2018)
WesterhuisJ.A. et al.
Generalized contribution plots in multivariate statistical process monitoring
Chemometr. Intell. Lab. Syst.
(2000)
AlcalaC.F. et al.
Reconstruction-based contribution for process monitoring
Automatica
(2009)
YoonS. et al.
Fault diagnosis with multivariate statistical models part I: using steady state fault signatures
J. Process Control
(2001)
KariwalaV. et al.
A branch and bound method for isolation of faulty variables through missing variable analysis
J. Process Control
(2010)

HornikK.

Approximation capabilities of multilayer feedforward networks

Neural Netw.

(1991)

LeshnoM. et al.

Multilayer feedforward networks with a nonpolynomial activation function can approximate any function

Neural Netw.

(1993)

CharteD.

A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines

Inf. Fusion

(2018)

ChengF. et al.

A novel process monitoring approach based on variational recurrent autoencoder

Comput. Chem. Eng.

(2019)

LeeS. et al.

Process monitoring using variational autoencoder for high-dimensional nonlinear processes

Eng. Appl. Artif. Intell.

(2019)

LuC. et al.

Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification

Signal Process.

(2017)

YanW. et al.

Nonlinear and robust statistical process monitoring based on variant autoencoders

Chemometr. Intell. Lab. Syst.

(2016)

RenS.

A new reconstruction-based auto-associative neural network for fault diagnosis in nonlinear systems

Chemometr. Intell. Lab. Syst.

(2018)

BhatN.V. et al.

Determining model structure for neural models by network stripping

Comput. Chem. Eng.

(1992)

Van den KerkhofP. et al.

Analysis of smearing-out in contribution plot based fault isolation for statistical process control

Chem. Eng. Sci.

(2013)

Joe QinS.

Process data analytics in the era of big data

AIChE J.

(2014)

Cited by (10)

Nonlinear process monitoring based on generic reconstruction-based auto-associative neural network
2023, Journal of the Franklin Institute
A significant concern with statistical fault diagnosis is the large number of false alarms caused by the smearing effect. Although the reconstruction-based approach effectively solves this problem, most of them only focus on linear rather than nonlinear systems. In the present work, a generic reconstruction-based auto-associative neural network (GRBAANN) is proposed that uses the reconstruction-based approach to isolate simple and complex faults for nonlinear systems. Nevertheless, in GRBAANN, it is challenging to acquire a trivial solution for the reconstruction-based index, which is equivalent to a complex vector fixed-point problem. In this regard, the Steffensen method is employed to deal with this problem with an accelerated iterative process, which is appropriate for both single and multiple variable faults. The variable selection procedure is time-consuming but imperative for reconstruction-based approaches, with no exception to the proposed method. In order to ensure the real-time diagnosis for large-scale systems, the Sequential floating forward selection method with memory is proposed to minimize the computation time of the variable selection procedure. The effectiveness of the proposed GRBAANN scheme is illustrated through a validation example and an industrial example. Comparisons with the state-of-art methods are also presented.
Hydrogenerator early fault detection: Sparse Dictionary Learning jointly with the Variational Autoencoder
2023, Engineering Applications of Artificial Intelligence
Monitoring the continuous health status of a Hydraulic Turbine Generator Unit (HTGU) is a strategic task to prevent any unexpected downtime. In addition to the loss of energy production following a prolonged shutdown, the various maintenance costs represent an undesirable effect due to unanticipated failure. One of the main functions of a condition-based maintenance (CBM), is the early detection of any unexpected changes in a machine behavior. Indeed, being able to detect any drift or change in behavior compared to a reference behavior represents a major challenge in the monitoring of complex machines like a HTGU. In this paper, an early anomaly detection model for HTGU using Variational Autoencoders (VAEs) and Sparse Dictionary Learning (SDL) is proposed. The combined reconstruction error thus obtained from the VAE-SDL model is used as an early fault detection. Experimental tests on vibratory signals show that the two models thus joined increase the sensitivity as well as the robustness of the anomaly detection.
Latent variable models in the era of industrial big data: Extension and beyond
2022, Annual Reviews in Control
Citation Excerpt :
Then the contribution of each process variable to the SPE is calculated to analyze and isolate abnormal process variables. Besides, Hallgrímsson, Niemann, and Lind (2021) further set the contribution to a non-square form. The shift of reconstructed variables can be back-propagated through sparse AE to obtain the shift of original variables, so that a series of causal paths can be obtained.
A rich supply of data and innovative algorithms have made data-driven modeling a popular technique in modern industry. Among various data-driven methods, latent variable models (LVMs) and their counterparts account for a major share and play a vital role in many industrial modeling areas. LVM can be generally divided into statistical learning-based classic LVM and neural networks-based deep LVM (DLVM). We first discuss the definitions, theories and applications of classic LVMs in detail, which serves as both a comprehensive tutorial and a brief application survey on classic LVMs. Then we present a thorough introduction to current mainstream DLVMs with emphasis on their theories and model architectures, soon afterwards provide a detailed survey on industrial applications of DLVMs. The aforementioned two types of LVM have obvious advantages and disadvantages. Specifically, classic LVMs have concise principles and good interpretability, but their model capacity cannot address complicated tasks. Neural networks-based DLVMs have sufficient model capacity to achieve satisfactory performance in complex scenarios, but it comes at sacrifices in model interpretability and efficiency. Aiming at combining the virtues and mitigating the drawbacks of these two types of LVMs, as well as exploring non-neural-network manners to build deep models, we propose a novel concept called lightweight deep LVM (LDLVM). After proposing this new idea, the article first elaborates the motivation and connotation of LDLVM, then provides two novel LDLVMs, along with thorough descriptions on their principles, architectures and merits. Finally, outlooks and opportunities are discussed, including important open questions and possible research directions.
Data-Driven Process Monitoring and Fault Diagnosis: A Comprehensive Survey
2024, Processes
System Condition Monitoring Based on a Standardized Latent Space and the Nataf Transform
2024, IEEE Access
Early Warning of Loss and Kick for Drilling Process Based on Sparse Autoencoder with Multivariate Time Series
2023, IEEE Transactions on Industrial Informatics

View all citing articles on Scopus

View full text

Unsupervised isolation of abnormal process variables using sparse autoencoders

Highlights

Abstract

Introduction

Section snippets

Latent Projection (LP)

Discovery of process knowledge

Online process monitoring and fault contribution analysis

Case study: The triple tank process

Discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Comput. Chem. Eng.

Control Eng. Pract.

J. Process Control

Chem. Eng. Sci.

Chemometr. Intell. Lab. Syst.

J. Process Control

Chemometr. Intell. Lab. Syst.

Automatica

J. Process Control

J. Process Control

Neural Netw.

Neural Netw.

Inf. Fusion

Comput. Chem. Eng.

Eng. Appl. Artif. Intell.

Signal Process.

Chemometr. Intell. Lab. Syst.

Chemometr. Intell. Lab. Syst.

Comput. Chem. Eng.

Chem. Eng. Sci.

Process data analytics in the era of big data

AIChE J.