Elsevier

Journal of Process Control

Volume 92, August 2020, Pages 296-309
Journal of Process Control

Detection and detectability of intermittent faults based on moving average T2 control charts with multiple window lengths

https://doi.org/10.1016/j.jprocont.2020.07.002Get rights and content

Highlights

  • Moving average T2 control charts with multiple window lengths are developed to detect intermittent faults (IFs).

  • The IF detectability is systematically studied in the MSPM framework for the first time.

  • Methods to exclude/compensate false/missing alarms and infer the appearing/disappearing time instances of IFs are presented.

Abstract

So far, problems of intermittent fault (IF) detection and detectability have not been fully investigated in the multivariate statistics framework. The characteristics of IFs are small magnitudes and short durations, and consequently traditional multivariate statistical methods using only a single observation are no longer effective. Thus in this paper, moving average T2 control charts (MA-TCCs) with multiple window lengths, which simultaneously employ a bank of MA-TCCs with different window lengths, are proposed to address the IF detection problem. Methods to reduce false/missing alarms and infer the IFs’ appearing and disappearing time instances are presented. In order to analyze the detection capability for IFs, definitions of guaranteed detectability are introduced, which is an extension and generalization of the original fault detectability concept focused on permanent faults (PFs). Then, necessary and sufficient conditions are derived for the detectability of IFs, which may appear and disappear several times with different magnitudes and durations. Based on these conditions, some optimal properties of two important window lengths are further discussed. In this way, a theoretical framework for the analysis of IFs’ detectability is established as well as extended discussions on how the theoretical results can be adapted to real-world applications. Finally, simulation studies on a numerical example and the continuous stirred tank reactor (CSTR) process are carried out to show the effectiveness of the developed methods.

Introduction

Data-driven fault detection (FD) for large-scale industry processes has received considerable attention over the past decades [1]. Due to its ability to handle high-dimensional and correlated process variables, the multivariate statistical process monitoring (MSPM) methodology is one of the most effective data-driven techniques for FD and process monitoring [2]. MSPM uses multivariate control charts such as Hotelling’s T2 statistic, principal component analysis (PCA), partial least squares (PLS), independent component analysis (ICA) or hidden Markov model (HMM)-based control charts [3]. According to how fault progresses in time, Isermann [4] has classified fault into three types: abrupt fault, incipient fault and intermittent fault. Both abrupt fault and incipient fault belong to the category of permanent faults (PFs).

With the rapid development of highly complex technologies, intermittent faults (IFs) have become a serious threat to system reliability. An IF is a kind of non-permanent fault that often recurs due to the same cause and lasts within a limited period of time [5], [6], [7]. IFs are common in a variety of fields [8], [9] and have imposed an enormous financial burden on electronics, satellites and many other industries [10]. Moreover, IFs tend to get worse over time and may eventually become permanent, resulting in the disruption or breakdown of industrial processes. The detection of IFs can effectively reduce the occurrence of catastrophic faults and is an important means to improve system reliability and security. Thus in recent years, IFs have gradually received noticeable interest from both academia and industry [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], and a review paper aiming to provide an overall picture of historical, current, and future developments in this area has been published [5]. Problems of detecting IF as well as its detectability in discrete event systems have been addressed in [11], [12], [13]. In additional, detection of IFs has been studied for linear stochastic systems [14], [15] with parameter uncertainties [16], [17]. Note that system models need to be known in these methods. As for data-driven methods, wavelet transform method has been utilized to detect intermittent interturn faults in a synchronous motor [18], [19]. In [20], short-time Fourier transform and undecimated discrete wavelet transform have been used to detect intermittent electrical/mechanical faults in motors. In [21], the decision forest method has been employed to investigate IFs via feature selection and classification. A dynamic-bayesian-network-based method has been presented to detect IFs in electronic systems [22]. These methods require historical data of various faults.

So far, the IF detection (IFD) and detectability problems have not been fully investigated in the MSPM framework, where historical data of faults are not necessary. The characteristics of IFs are small magnitudes and short durations. The magnitude of IF can be as small as incipient fault while its duration is shorter. Thus, IFs are even more difficult to detect than incipient faults. It has been indicated [23] that traditional MSPM methods using only a single observation such as PCA, PLS and ICA are not sensitive to incipient faults, thus not to mention IFs. Fortunately, several studies [24], [25], [26] have shown that faults with small magnitudes can be efficiently detected by employing a time window, i.e., the moving average (MA) or moving window (MW) techniques, giving birth to the MA-PCA [27], MA-PLS [23], MW-PCA [28], MW-HMM [29] and so on. This has paved the way for our investigation of the IFD problem.

However, selections of window lengths in these works have not considered the characteristics of fault duration. Moreover, existing methods have only considered using a single window length. In terms of using multiple window lengths simultaneously, detection and detectability of IFs have not been fully investigated in available literature due to the complexity of integrating varied detection results given by different window lengths. These issues constitute the main motivations of our present study. Some other important FD methods that also employ a time window are the dynamic MSPM methods, such as dynamic PCA (DPCA), canonical variate analysis (CVA) and stationary/nonstationary-hybrid-characteristics-based dissimilarity analysis [30]. Note that in this paper, process data are assumed to be independent, and thus these methods will turn into traditional single-observation-based MSPM methods which are not sensitive to IFs, or the dissimilarity analysis method [31], [32]. As for the dissimilarity analysis, it is an advanced MSPM method that also employs a time window, and has shown a favorable performance for incipient fault detection and isolation [33]. It usually needs a large window length to calculate the covariance matrix of online data set [34]. Considering that the durations of IFs are always limited, the use of dissimilarity-based methods for IFD still requires further justification.

Hotelling’s T2 statistic is a well-known function of the likelihood ratio criterion, which consequently makes it admissible and uniformly powerful in certain classes of hypothesis tests [35]. Thus in this paper, T2 statistic has been combined with the MA technique to constitute a bank of MA T2 control charts (MA-TCCs) with different window lengths. The main contributions of the present paper are summarized as follows: (1) MA-TCCs with multiple window lengths, including methods to exclude/compensate false/missing alarms and infer the appearing and disappearing time instances of IFs are proposed based on the detectability of each single MA-TCC. (2) The concept of IF detectability is defined for the first time in the MSPM framework, which is an extension and generalization of the original fault guaranteed detectability concept focused on PFs. (3) A theoretical framework for the analysis of IF detectability is established. Necessary and sufficient conditions for the detectability of IFs, which may appear and disappear several times with different magnitudes and durations are given. Extended discussions on how theoretical results can help detect IFs in practical applications are also presented.

The remainder of this paper is organized as follows. In Section 2, the MA-TCC is introduced for the IFD problem. Then the detectability of IFs is analyzed in Section 3. MA-TCCs with multiple window lengths are utilized to reduce false/missing alarms and infer IFs’ appearing and disappearing time instances in Section 4. Simulation results are presented in Section 5, and conclusions are given in Section 6.

Notation: Bold-face notations in lowercase and uppercase stand for vectors and matrices respectively, so as to distinguish them from scalars. A bold-face notation in [] such as [k], is used to highlight the scalar in []. AT and A1 stand for the transpose and the inverse of a matrix A, respectively. Np(μ,Σ) represents a p-dimensional normal distribution with expectation μ and covariance matrix Σ. Wp(N,Σ) represents a p-dimensional Wishart distribution with N degrees of freedom. F(p,Np) is a central F distribution with p and Np degrees of freedom. Fα(p,Np) is the 1α percentile of the central F distribution with p and Np degrees of freedom. N+ and R+ are the sets of positive integers and positive real numbers, respectively. [x]+ is the minimum integer no less than x, and [x] is the maximum integer no more than x. x>()max{y,z} means if yz, then x>y, otherwise xz. is the empty set, and a,b={xR:ax<b}. is to give a definition.

Section snippets

Hotelling’s T2 distribution

The following lemma is the key result regarding Hotelling’s T2 distribution, see [36].

Lemma 1

Let T2=xTS1x, where x and S are independently distributed with xNp(μ,Σ) and NSWp(N,Σ), where Np. Then T2NpNp+1F(p,Np+1;ϵ2),where the noncentrality parameter ϵ2=μTΣ1μ.

Moving average T2 control chart (MA-TCC)

Suppose we have collected N independent samples x1,x2,,xN from Np(μ,Σ) under certain sampling rate as training data, which can represent the statistic characteristics of systems’ normal conditions. We also collect current process data

Definitions of guaranteed detectability

From both an analytical and a practical point of view, it is important to know whether a fault is detectable by the proposed methods. Consider the following widely used fault model in the MSPM framework [37], [38] xkf=xk+Ξkfk,where xk represents the process fluctuation under normal conditions, Ξk is the direction of the fault in time instance k, and fk is its magnitude. The fault-free part xk usually represents a normal steady-state condition. In this way, the above fault model represents

IFD based on MA-TCCs with multiple window lengths

The advantage and disadvantage of introducing the time window for IFD are apparent, i.e., the improved sensitivity and the introduced alarm delay. Thus, it is natural to consider MA-TCCs with multiple window lengths, denoted as MA-TCCs(M), for the IFD problem. However, detection results given by different window lengths are often inconsistent due to false or missing alarms in real-world applications. Thus, methods to exclude false alarms and compensate missing alarms are proposed first on the

A numerical example

A simulated process model with two correlated variables is employed first. The process model under normal conditions follows a multivariate Gaussian distribution as follows xN2(μ,Σ),μ=64,Σ=32.62.64.Both 5000 training samples and 500 test samples are generated according to (37), and intermittent process faults are subsequently introduced in the test dataset. The significance level α is 0.01. The introduced IFs have an additive form as modeled by (10) with the fault direction ξq=[0.2425,0.9701]T

Conclusion

In this paper, moving average T2 control charts with multiple window lengths (MA-TCCs(M)) have been developed for intermittent fault (IF) detection. The MA-TCCs(M) incorporate historical information through a bank of time windows and thus can improve the IFD performance. The detectability of IFs has been investigated theoretically, and choices of window lengths in different practical conditions have been discussed. The advantage of using time window for permanent fault (PF) detection is

CRediT authorship contribution statement

Yinghong Zhao: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Formal analysis, Investigation, Software, Validation. Xiao He: Supervision, Conceptualization, Funding acquisition, Writing - review & editing, Resources. Michael G. Pecht: Supervision, Writing - review & editing, Resources. Junfeng Zhang: Writing - review & editing, Validation. Donghua Zhou: Supervision, Project administration, Conceptualization, Funding acquisition, Writing - review & editing,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (43)

  • MnassriB. et al.

    Generalization and analysis of sufficient conditions for pca-based fault detectability and isolability

    Annu. Rev. Control

    (2013)
  • DuniaR. et al.

    A unified geometric approach to process and sensor fault identification and reconstruction: the unidimensional fault case

    Comput. Chem. Eng.

    (1998)
  • LiG. et al.

    Reconstruction based fault prognosis for continuous processes

    Control Eng. Pract.

    (2010)
  • RussellE.L. et al.

    Data-driven Methods for Fault Detection and Diagnosis in Chemical Processes

    (2000)
  • YinS. et al.

    A review on basic data-driven approaches for industrial process monitoring

    IEEE Trans. Ind. Electron.

    (2014)
  • ZhouD.H. et al.

    Review on diagnosis techniques for intermittent faults in dynamic systems

    IEEE Trans. Ind. Electron.

    (2020)
  • ZhaoY.H. et al.

    Detecting intermittent faults with moving average techniques

  • CorrecherA. et al.

    Intermittent failure dynamics characterization

    IEEE Trans. Reliab.

    (2012)
  • ZhangJ.X. et al.

    A novel lifetime estimation method for two-phase degrading systems

    IEEE Trans. Reliab.

    (2019)
  • BakhshiR. et al.

    Intermittent failures in hardware and software

    J. Electron. Packag.

    (2014)
  • JiangS.B. et al.

    Diagnosis of repeated/intermittent failures in discrete event systems

    IEEE Trans. Robot. Autom.

    (2003)
  • Cited by (0)

    This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 61751307, 61733009, the Research Fund for the Taishan Scholar Project of Shandong Province of China (LZB2015-162), the Key Project from Natural Sciences Foundation of Guangdong Province, China under Grant 2018B030311054, and the BNRist Program, China under Grant BNR2019TD01009.

    View full text