Detection and detectability of intermittent faults based on moving average control charts with multiple window lengths☆
Introduction
Data-driven fault detection (FD) for large-scale industry processes has received considerable attention over the past decades [1]. Due to its ability to handle high-dimensional and correlated process variables, the multivariate statistical process monitoring (MSPM) methodology is one of the most effective data-driven techniques for FD and process monitoring [2]. MSPM uses multivariate control charts such as Hotelling’s statistic, principal component analysis (PCA), partial least squares (PLS), independent component analysis (ICA) or hidden Markov model (HMM)-based control charts [3]. According to how fault progresses in time, Isermann [4] has classified fault into three types: abrupt fault, incipient fault and intermittent fault. Both abrupt fault and incipient fault belong to the category of permanent faults (PFs).
With the rapid development of highly complex technologies, intermittent faults (IFs) have become a serious threat to system reliability. An IF is a kind of non-permanent fault that often recurs due to the same cause and lasts within a limited period of time [5], [6], [7]. IFs are common in a variety of fields [8], [9] and have imposed an enormous financial burden on electronics, satellites and many other industries [10]. Moreover, IFs tend to get worse over time and may eventually become permanent, resulting in the disruption or breakdown of industrial processes. The detection of IFs can effectively reduce the occurrence of catastrophic faults and is an important means to improve system reliability and security. Thus in recent years, IFs have gradually received noticeable interest from both academia and industry [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], and a review paper aiming to provide an overall picture of historical, current, and future developments in this area has been published [5]. Problems of detecting IF as well as its detectability in discrete event systems have been addressed in [11], [12], [13]. In additional, detection of IFs has been studied for linear stochastic systems [14], [15] with parameter uncertainties [16], [17]. Note that system models need to be known in these methods. As for data-driven methods, wavelet transform method has been utilized to detect intermittent interturn faults in a synchronous motor [18], [19]. In [20], short-time Fourier transform and undecimated discrete wavelet transform have been used to detect intermittent electrical/mechanical faults in motors. In [21], the decision forest method has been employed to investigate IFs via feature selection and classification. A dynamic-bayesian-network-based method has been presented to detect IFs in electronic systems [22]. These methods require historical data of various faults.
So far, the IF detection (IFD) and detectability problems have not been fully investigated in the MSPM framework, where historical data of faults are not necessary. The characteristics of IFs are small magnitudes and short durations. The magnitude of IF can be as small as incipient fault while its duration is shorter. Thus, IFs are even more difficult to detect than incipient faults. It has been indicated [23] that traditional MSPM methods using only a single observation such as PCA, PLS and ICA are not sensitive to incipient faults, thus not to mention IFs. Fortunately, several studies [24], [25], [26] have shown that faults with small magnitudes can be efficiently detected by employing a time window, i.e., the moving average (MA) or moving window (MW) techniques, giving birth to the MA-PCA [27], MA-PLS [23], MW-PCA [28], MW-HMM [29] and so on. This has paved the way for our investigation of the IFD problem.
However, selections of window lengths in these works have not considered the characteristics of fault duration. Moreover, existing methods have only considered using a single window length. In terms of using multiple window lengths simultaneously, detection and detectability of IFs have not been fully investigated in available literature due to the complexity of integrating varied detection results given by different window lengths. These issues constitute the main motivations of our present study. Some other important FD methods that also employ a time window are the dynamic MSPM methods, such as dynamic PCA (DPCA), canonical variate analysis (CVA) and stationary/nonstationary-hybrid-characteristics-based dissimilarity analysis [30]. Note that in this paper, process data are assumed to be independent, and thus these methods will turn into traditional single-observation-based MSPM methods which are not sensitive to IFs, or the dissimilarity analysis method [31], [32]. As for the dissimilarity analysis, it is an advanced MSPM method that also employs a time window, and has shown a favorable performance for incipient fault detection and isolation [33]. It usually needs a large window length to calculate the covariance matrix of online data set [34]. Considering that the durations of IFs are always limited, the use of dissimilarity-based methods for IFD still requires further justification.
Hotelling’s statistic is a well-known function of the likelihood ratio criterion, which consequently makes it admissible and uniformly powerful in certain classes of hypothesis tests [35]. Thus in this paper, statistic has been combined with the MA technique to constitute a bank of MA control charts (MA-TCCs) with different window lengths. The main contributions of the present paper are summarized as follows: (1) MA-TCCs with multiple window lengths, including methods to exclude/compensate false/missing alarms and infer the appearing and disappearing time instances of IFs are proposed based on the detectability of each single MA-TCC. (2) The concept of IF detectability is defined for the first time in the MSPM framework, which is an extension and generalization of the original fault guaranteed detectability concept focused on PFs. (3) A theoretical framework for the analysis of IF detectability is established. Necessary and sufficient conditions for the detectability of IFs, which may appear and disappear several times with different magnitudes and durations are given. Extended discussions on how theoretical results can help detect IFs in practical applications are also presented.
The remainder of this paper is organized as follows. In Section 2, the MA-TCC is introduced for the IFD problem. Then the detectability of IFs is analyzed in Section 3. MA-TCCs with multiple window lengths are utilized to reduce false/missing alarms and infer IFs’ appearing and disappearing time instances in Section 4. Simulation results are presented in Section 5, and conclusions are given in Section 6.
Notation: Bold-face notations in lowercase and uppercase stand for vectors and matrices respectively, so as to distinguish them from scalars. A bold-face notation in [] such as , is used to highlight the scalar in []. and stand for the transpose and the inverse of a matrix , respectively. represents a -dimensional normal distribution with expectation and covariance matrix . represents a -dimensional Wishart distribution with degrees of freedom. is a central distribution with and degrees of freedom. is the percentile of the central distribution with and degrees of freedom. and are the sets of positive integers and positive real numbers, respectively. is the minimum integer no less than , and is the maximum integer no more than . means if , then , otherwise . is the empty set, and . is to give a definition.
Section snippets
Hotelling’s distribution
The following lemma is the key result regarding Hotelling’s distribution, see [36].
Lemma 1 Let , where and are independently distributed with and , where . Then where the noncentrality parameter .
Moving average control chart (MA-TCC)
Suppose we have collected independent samples from under certain sampling rate as training data, which can represent the statistic characteristics of systems’ normal conditions. We also collect current process data
Definitions of guaranteed detectability
From both an analytical and a practical point of view, it is important to know whether a fault is detectable by the proposed methods. Consider the following widely used fault model in the MSPM framework [37], [38] where represents the process fluctuation under normal conditions, is the direction of the fault in time instance , and is its magnitude. The fault-free part usually represents a normal steady-state condition. In this way, the above fault model represents
IFD based on MA-TCCs with multiple window lengths
The advantage and disadvantage of introducing the time window for IFD are apparent, i.e., the improved sensitivity and the introduced alarm delay. Thus, it is natural to consider MA-TCCs with multiple window lengths, denoted as MA-TCCs(M), for the IFD problem. However, detection results given by different window lengths are often inconsistent due to false or missing alarms in real-world applications. Thus, methods to exclude false alarms and compensate missing alarms are proposed first on the
A numerical example
A simulated process model with two correlated variables is employed first. The process model under normal conditions follows a multivariate Gaussian distribution as follows Both 5000 training samples and 500 test samples are generated according to (37), and intermittent process faults are subsequently introduced in the test dataset. The significance level is 0.01. The introduced IFs have an additive form as modeled by (10) with the fault direction
Conclusion
In this paper, moving average control charts with multiple window lengths (MA-TCCs(M)) have been developed for intermittent fault (IF) detection. The MA-TCCs(M) incorporate historical information through a bank of time windows and thus can improve the IFD performance. The detectability of IFs has been investigated theoretically, and choices of window lengths in different practical conditions have been discussed. The advantage of using time window for permanent fault (PF) detection is
CRediT authorship contribution statement
Yinghong Zhao: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Formal analysis, Investigation, Software, Validation. Xiao He: Supervision, Conceptualization, Funding acquisition, Writing - review & editing, Resources. Michael G. Pecht: Supervision, Writing - review & editing, Resources. Junfeng Zhang: Writing - review & editing, Validation. Donghua Zhou: Supervision, Project administration, Conceptualization, Funding acquisition, Writing - review & editing,
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (43)
Survey on data-driven industrial process monitoring and diagnosis
Annu. Rev. Control
(2012)Model-based fault-detection and diagnosis – status and applications
Annu. Rev. Control
(2005)- et al.
Intermittent fault detection with control chart
IFAC-PapersOnLine
(2018) - et al.
Diagnosability of intermittent sensor faults in discrete event systems
Automatica
(2017) - et al.
Robust detection of intermittent sensor faults in stochastic ltv systems
Neurocomputing
(2020) - et al.
Incipient fault detection with smoothing techniques in statistical process monitoring
Control Eng. Pract.
(2017) Exponentially weighted moving principal components analysis and projections to latent structures
Chemometr. Intell. Lab. Syst.
(1994)- et al.
Nonlinear process monitoring based on kernel dissimilarity analysis
Control Eng. Pract.
(2009) - et al.
A sparse dissimilarity analysis algorithm for incipient fault isolation with no priori fault information
Control Eng. Pract.
(2017) - et al.
Reconstruction-based contribution for process monitoring
Automatica
(2009)
Generalization and analysis of sufficient conditions for pca-based fault detectability and isolability
Annu. Rev. Control
A unified geometric approach to process and sensor fault identification and reconstruction: the unidimensional fault case
Comput. Chem. Eng.
Reconstruction based fault prognosis for continuous processes
Control Eng. Pract.
Data-driven Methods for Fault Detection and Diagnosis in Chemical Processes
A review on basic data-driven approaches for industrial process monitoring
IEEE Trans. Ind. Electron.
Review on diagnosis techniques for intermittent faults in dynamic systems
IEEE Trans. Ind. Electron.
Detecting intermittent faults with moving average techniques
Intermittent failure dynamics characterization
IEEE Trans. Reliab.
A novel lifetime estimation method for two-phase degrading systems
IEEE Trans. Reliab.
Intermittent failures in hardware and software
J. Electron. Packag.
Diagnosis of repeated/intermittent failures in discrete event systems
IEEE Trans. Robot. Autom.
Cited by (0)
- ☆
This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 61751307, 61733009, the Research Fund for the Taishan Scholar Project of Shandong Province of China (LZB2015-162), the Key Project from Natural Sciences Foundation of Guangdong Province, China under Grant 2018B030311054, and the BNRist Program, China under Grant BNR2019TD01009.