Elsevier

Neurocomputing

Volume 466, 27 November 2021, Pages 58-68
Neurocomputing

Attention-based sequence to sequence model for machine remaining useful life prediction

https://doi.org/10.1016/j.neucom.2021.09.022Get rights and content

Abstract

Accurate estimation of remaining useful life (RUL) of industrial equipment can enable advanced maintenance schedules, increase equipment availability and reduce operational costs. However, existing deep learning methods for RUL prediction are not completely successful due to the following two reasons. First, relying on a single objective function to estimate the RUL will limit the learned representations and thus affect the prediction accuracy. Second, while longer sequences are more informative for modelling the sensor dynamics of equipment, existing methods are less effective to deal with very long sequences, as they mainly focus on the latest information. To address these two problems, we develop a novel attention-based sequence to sequence with auxiliary task (ATS2S) model. In particular, our model jointly optimizes both reconstruction loss to empower our model with predictive capabilities (by predicting next input sequence given current input sequence) and RUL prediction loss to minimize the difference between the predicted RUL and actual RUL. Furthermore, to better handle longer sequences, we employ the attention mechanism to focus on all the important input information during the training process. Finally, we propose a new dual-latent feature representation to integrate the encoder features and decoder hidden states, to capture rich semantic information in data. We conduct extensive experiments on four real datasets to evaluate the efficacy of the proposed method. Experimental results show that our proposed method can achieve superior performance over 13 state-of-the-art methods consistently.

Introduction

Prognostic and Health Management (PHM) is receiving much attention in many industrial applications, as it can potentially reduce equipment downtime and increase system reliability. Typically, PHM systems are leveraged to monitor the condition of mechanical or electrical equipment based on their environmental information and domain knowledge. One key task in PHM is the reliable prediction of remaining useful life (RUL) of an equipment. RUL is defined as time interval between the current state and the end-of-life state. With accurate RUL estimation, industries can have predictive maintenance planning and thus prevent catastrophic failures or faults from happening [1]. Yet, with the sophisticated machine design and the dynamic surrounding environment, precise estimation of RUL can be of great challenge. Various approaches have been proposed to estimate the RUL of machines. These approaches can be classified into three major categories, namely, model-based approaches, data-driven approaches, and hybrid approaches. Specifically, model-based approaches require strong theoretical understanding to model the behaviour of equipment and its detailed degradation process [2]. As equipment complexity continues to evolve, it becomes extremely challenging to apply model-based methods in real applications [3]. With increasing data availability in smart manufacturing, data-driven approaches have emerged more promisingly for predicting the RUL of equipment. These methods aim to explore the underlying relationship between the sensor readings and degradation trend, such as hidden Markov model, artificial neural network [4], extreme learning machines [5], and support vector machines [6]. However these approaches require manual feature engineering to extract the corresponding degradation pattern, which can be very laborious task. Hybrid approaches aim to improve the physical model via leveraging the data availability to better detect the deterioration trend [7]. They also suffer from the difficulty of building accurate physical models and effectively combining both techniques.

In recent years, with the surge of computational power and the data volume, deep learning with its hierarchical multi-layer representative power can automatically extract silent features without handcrafted feature engineering. As a result, research paradigm of RUL prediction is shifting from conventional machine learning to deep learning based architectures. Various deep learning methods, including convolutional neural networks (CNN) and recurrent neural networks (RNN), have been developed for RUL prediction. In particular, CNN based methods aim to use 1-dimensional convolutional kernel to extract the sequential information from time series data [8], [9]. However, CNN-based approaches still have limited capability for RUL prediction, as they are not able to capture long-range sequential dependencies in sensory data.

RNN based approaches were developed to capture the temporal dependency among time series data [10]. However, conventional RNN architectures still suffer from vanishing gradient problem with longer time dependencies. To tackle this issue, the long short-term memory (LSTM), a gated RNN with both long and short memories, was developed to address the vanishing gradient problem and achieved the state-of-the-art performance for RUL prediction [11], [12], [13], [14], [15]. Yet, LSTM based methods tend to lose relevant and important historical information when dealing with very long sequences [16], as they only focus on latest sequence information when mapping the whole input sequence into fixed-length vector representation. In addition, all the aforementioned methods only used a single objective, i.e., minimizing the mean square error (MSE) between the predicted and true values for the model training. We argue that using a single objective can limit the generalization performance of the model on unseen test data [17], [18].

To address the above two problems, we propose a dual-objective sequence to sequence approach named ATS2S for accurate RUL prediction. First, to address the shortage of LSTM with long sequences, we propose an attention based decoding and focus on the important parts of the input sequence (instead of the latest information in LSTM) that can maximize the decoding performance without losing relevant information. Additionally, we integrate the last hidden state of the decoder with the encoder hidden features as a comprehensive dual-latent feature representation for the RUL predictor. Second, inspired by the success of auxiliary tasks in improving the generalization performance [17], [18] in computer vision applications, we design a novel auxiliary task to further improve the prediction capability on unseen test data. Particularly, given the current input sequence, we train the model to reconstruct the future input sequence in an unsupervised manner. Concurrently, we train the model with a supervised MSE loss between the true RUL labels and the predicted ones.

Overall, our main contributions can be summarized as follows.

  • Our model jointly optimizes both reconstruction loss of future sequence to empower our model with predictive capabilities (by predicting the next input sequence given current input sequence) and RUL prediction loss to minimize the difference between the predicted RUL and actual RUL.

  • We design an attention mechanism in the encoder-decoder network to handle the long sequences. As such, our model can focus on the most relevant information of the input sequences for RUL prediction.

  • We propose a new dual-latent feature representation to integrate the encoder features and decoder hidden states, to capture rich semantic information in the data for RUL prediction.

  • We conduct extensive experiments on four benchmark datasets to evaluate our proposed approach. The results show that the proposed approach can significantly improve RUL prediction over 13 state-of-the-arts.

Section snippets

Related work

Deep learning with the ability of automatic feature extraction has achieved wide success in many fields, including computer vision, natural language processing, and speech recognition [19]. Very recently, various deep learning methods, e.g., CNN and RNN, have also been explored for RUL prediction [20], [21]. For instance, Li et al., proposed a CNN with 1-D filters to extract features from input sensor data for RUL prediction and also used window-time approach to prepare data samples for

Methodology

In this section, we will introduce our proposed attention-based sequence to sequence with auxiliary task (ATS2S) model for RUL prediction.

Experiments and results

We have conducted extensive experiments on benchmark data to evaluate the performance of our proposed model.

Conclusion

In this work, we presented a novel attention-based sequence to sequence model ATS2S to accurately predict equipment RUL, which has huge impact for many real-world applications. In particular, we designed a novel framework that learns to reconstruct the next sequence and predict the RUL labels concurrently. In addition, we showed our attention mechanism can better capture all the relevant historical information from long sensor sequences than standard LSTM approach which focuses on the latest

CRediT authorship contribution statement

Mohamed Ragab: Methodology, Writing - original draft. Zhenghua Chen: Supervision. Min Wu: Supervision, Writing - review & editing. Chee-Keong Kwoh: Supervision. Ruqiang Yan: Writing - review & editing. Xiaoli Li: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Mohamed Ragab received the B.Sc. degree (First Class Hons.) and M.Sc. degree from the Department of Electrical Engineering, Aswan University, in 2014 and 2017, respectively. He is currently pursuing his Ph.D. degree from the School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore. Concurrently, he is with Machine Intellection (MI) department at the Institute of Infocomm Research (I2R), A*STAR. His research interests include deep learning, transfer learning,

References (34)

  • J.B. Ali et al.

    Accurate bearing remaining useful life prediction based on weibull distribution and artificial neural network

    Mechanical Systems and Signal Processing

    (2015)
  • X. Peng, C. Zhang, Y. Yu, Y. Zhou, Battery remaining useful life prediction algorithm based on support vector...
  • B. Yang et al.

    Remaining useful life prediction based on a double-convolutional neural network architecture

    IEEE Transactions on Industrial Electronics

    (2019)
  • A. Malhi et al.

    Prognosis of defect propagation based on recurrent neural networks

    IEEE Transactions on Instrumentation and Measurement

    (2011)
  • S. Zheng et al.

    Long short-term memory network for remaining useful life estimation

  • C.-G. Huang et al.

    A bidirectional lstm prognostics method under multiple operational conditions

    IEEE Transactions on Industrial Electronics

    (2019)
  • Z. Chen et al.

    Machine remaining useful life prediction via an attention based deep learning approach

    IEEE Transactions on Industrial Electronics

    (2020)
  • Cited by (0)

    Mohamed Ragab received the B.Sc. degree (First Class Hons.) and M.Sc. degree from the Department of Electrical Engineering, Aswan University, in 2014 and 2017, respectively. He is currently pursuing his Ph.D. degree from the School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore. Concurrently, he is with Machine Intellection (MI) department at the Institute of Infocomm Research (I2R), A*STAR. His research interests include deep learning, transfer learning, and intelligent fault diagnosis and prognosis.

    Zhenghua Chen received the B.Eng. degree in mechatronics engineering from University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2011, and Ph.D. degree in electrical and electronic engineering from Nanyang Technological University (NTU), Singapore, in 2017. He has been working at NTU as a research fellow. Currently, he is a scientist at Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore. He has won several competitive awards, such as First Place Winner for CVPR 2021 UG2+ Challenge, A*STAR Career Development Award, First Runner-Up Award for Grand Challenge at IEEE VCIP 2020, Finalist Academic Paper Award at IEEE ICPHM 2020, etc. He serves as Associate Editor for Elsevier Neurocomputing and Guest Editor for IEEE Transactions on Emerging Topics in Computational Intelligence. He is currently the Vice Chair of IEEE Sensors Council Singapore Chapter and IEEE Senior Member. His research interests include smart sensing, data analytics, machine learning, transfer learning and related applications.

    Min Wu is currently a senior scientist in Data Analytics Department, Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore. He received his Ph.D. degree in Computer Science from Nanyang Technological University (NTU), Singapore, in 2011 and B.S. degree in Computer Science from University of Science and Technology of China (USTC) in 2006. He received the best paper awards in InCoB 2016 and DASFAA 2015. He also won the IJCAI competition on repeated buyers prediction in 2015. His current research interests include machine learning, data mining and bioinformatics.

    Chee-Keong Kwoh received the bachelor’s degree in electrical engineering (first class) and the master’s degree in industrial system engineering from the National University of Singapore, Singapore, in 1987 and 1991, respectively. He received the Ph.D. degree from the Imperial College of Science, Technology, and Medicine, University of London, in 1995. He has been with the School of Computer Engineering, Nanyang Technological University (NTU), since 1993. He is the Deputy Executive Director of PaCE at NTU. His research interests include data mining, soft computing and graph-based inference; application areas include bioinformatics and engineering. He has done significant research work in his research areas and has published many quality international conferences and journal papers. He is a member of the Association for Medical and Bio-Informatics, Imperial College Alumni Association of Singapore. He has provided many services to professional bodies in Singapore and was conferred the Public Service Medal by the president of Singapore in 2008.

    Ruqiang Yan (M’07-SM’11) received the M.S. degree in precision instrument and machinery from the University of Science and Technology of China, Hefei, China, in 2002, and the Ph.D. degree in mechanical engineering from the University of Massachusetts Amherst, MA, USA, in 2007.

    From 2009 to 2018, he was a Professor of the School of Instrument Science and Engineering at the Southeast University, Nanjing, China. He joined the School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, China, in 2018. His research interests include data analytics, machine learning, and energy-efficient sensing and sensor networks for the condition monitoring and health diagnosis of large-scale, complex, dynamical systems. He holds 28 patents, published two books and over 200 papers in technical journals and conference proceedings.

    Dr. Yan is a Fellow of ASME (2019). His honors and awards include IEEE Instrumentation and Measurement Society Technical Award (2019), the New Century Excellent Talents in University Award from the Ministry of Education in China (2009), and multiple Best Paper Awards. He is an Associate Editor-in-Chief for the IEEE Transactions on Instrumentation and Measurement and Associate Editor for the IEEE Systems Journal and IEEE Sensors Journal.

    Xiaoli Li is currently a principal scientist at the Institute for Infocomm Research, A*STAR, Singapore. He also holds adjunct professor positions at Nanyang Technological University. His research interests include data mining, machine learning, AI, and bioinformatics. He has been serving as a (senior) PC member/workshop chair/session chair in leading data mining and AI related conferences (including KDD, ICDM, SDM, PKDD/ECML, WWW, IJCAI, AAAI, ACL and CIKM). Xiaoli has published more than 200 high quality papers and won numerous best paper/benchmark competition awards.

    View full text