Elsevier

Neurocomputing

Volume 422, 21 January 2021, Pages 85-94
Neurocomputing

Fast training of deep LSTM networks with guaranteed stability for nonlinear system modeling

https://doi.org/10.1016/j.neucom.2020.09.030Get rights and content

Abstract

Deep recurrent neural networks (RNN), such as LSTM, have many advantages over forward networks for nonlinear system modeling. However, the most used training method, backward propagation through time (BPTT), is very slow.

In this paper, by separating the LSTM cell into forward and recurrent models, we give a faster training method than BPTT. The deep LSTM is modified by combining the deep RNN with the multilayer perceptrons (MLP). The backpropagation-like training methods are proposed for the deep RNN and MLP trainings. The stability of these algorithms are demonstrated. The simulation results show that our fast training methods for LSTM are better than the conventional approaches.

Introduction

Recurrent neural networks (RNNs) have some similar properties with many time series, such as sentences and sound waves, and dynamic systems. Their current data values depend on the past data. RNNs are good models for modeling them. The propagation backwards through time (BPTT) is an effective training method for RNNs. However, BPTT training has many problems, such as gradient loss and slow convergence [4], [15], [16]. The long-term memory network (LSTM) [8] is a popular RNN in recent years. The deep LSTM is one of the most important deep learning models. By using the gate units, LSTM avoids the problems of gradient degradation and learning long-term patterns. But training LSTM still needs a lot more computing time.

Deep learning models augment the hidden layer number instead of increasing the neuronal nodes in each layer, see [7]. This idea can successfully avoid the local minimums [10] and the problem of determining the structure [2]. Deep LSTM has been widely applied in many areas, especially in time series modeling, including speech recognition, natural language processing and sequence prediction [6]. Simplified LSTMs, like GRU [5] and SRU [11], are also very effective for modeling time series.

In addition to time series modeling, LSTM can also be applied for modeling nonlinear systems. In [14], LSTM is the basic cell of multiple models. In [17], the behaviors of deep LSTM are similar with dynamic systems. These LSTMs use BPTT to train the neural models, the learning processes are slow [18].

In order to avoid to use the slow training method BPTT for the deep LSTM model, in this paper we make the following contributions:

  • 1.

    We propose a fast training method for LSTM. Each LSTM cell is formed into one RNN and one feedforward NN. We use the backpropagation-like training methods for these two networks, where the training error is back propagated to all LSTM cells.

  • 2.

    Stability of the proposed training methods is proven via Lyapunov theory and the input-to-state stability.

The modeling results of the unsteady transonic aerodynamic and two benchmark problems show that the new LSTM training model is much better than the BPTT training and MLP methods.

Section snippets

LSTM as a dynamic system

Theoretically, a RNN can model any time series, regardless of how long the current status depends on their previous information. In practice, it cannot, because the information between the relevant and place becomes smaller and smaller when time series is long. LSTM uses the “gate” technique to let useful information pass. So it has capable of handling “long-term dependencies”.

In this session, we will transform the conventional LSTM cell into the form of a dynamic system, because 1) with this

Fast training of deep LSTM

The unknown dynamic nonlinear system has the following form,y(k)=Ψyk-1,yk-2,uk,,uk-nuwhere Ψ· is an unknown nonlinear difference equation representing the plant dynamics, uk and yk are input and output.

To model the plant (9), we use the following neural networks,ŷ(k)=NN[ŷ(k-1),ŷ(k-2),,uk,uk-1,]where ŷ(k) is the output of the neural networks. We use the deep LSTM shown in Fig. 3 to model the nonlinear plant (9). The novel deep LSTM for the nonlinear system modeling has q hidden layers,

Stability of the fast training method

We first analyze the stability of the training methods for the feedforward neural networks (FNN). The input of the FNN in (15) is xk, the output is z5k, the training error is eR defined in (16). This identification error can be represented aseFk=ϕW2,kxkσW3,kxk-ϕW2xkσW3xk+μ1kwhere W2 and W3 are set of unknown weights which may minimize the modeling error μ1k.

Using Taylor series around the point of W3,kxk and W2,k, the identification error can be represented aseFk=ϕW2,kxkσW3,kxk-ϕW2xkσW3xk+μ

Simulations

One way to model and understand a system is to be able to predict its behavior in a very short time, taking into account what has happened during the last step. This is known as one-step prediction. Most of neural models work well for the one-step prediction with the model (10). In this paper we challenge the multi-step prediction problem; we only use the input u(k). In this session, we use three examples to compare our approach with the conventional neural modeling methods.

The simulations are

Conclusion

The modified deep LSTM proposed in this paper takes advantages of multilayer perceptrons and LSTM. To avoid using the slow training approach BPTT, we give stable and fast learning algorithms for LSTM cell. The stability of the proposed training method is demonstrated by Lyapunov method and input-to-state stability theory. There examples are applied to compare our algorithms with MLP and conventional LSTM. The results show the proposed deep LSTM is better than the other existed neural models for

CRediT authorship contribution statement

Wen Yu: Conceptualization, Methodology, Writing - original draft. Jesus Gonzalez: Software, Methodology. Xiaoou Li: Supervision, Investigation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by CONACYT under Grant CONACyT-A1-S-8216, by CINVESTAV under Grant SEP-CINVESTAV-62 and Grant CNR-CINVESTAV.

Wen Yu received the B.S. degree from Tsinghua University, Beijing, China in 1990 and the M.S. and Ph.D. degrees, both in Electrical Engineering, from Northeastern University, Shenyang, China, in 1992 and 1995, respectively. Since 1996, he has been with the National Polytechnic Institute (CINVESTAV-IPN), Mexico City, Mexico, where he is currently a professor and department chair of the Automatic Control Department. From 2002 to 2003, he held research positions with the Mexican Institute of

References (18)

  • O. Bendiksen

    Review of unsteady transonic aerodynamics: Theory and applications

    Progress in Aerospace Sciences

    (2011)
  • W. Cao et al.

    A review on neural networks with random weights

    Neurocomputing

    (2018)
  • Y. Bengio et al.

    Greedy layer-wise training of deep networks

    Advances in Neural Information Processing Systems

    (2007)
  • J. Bergstra et al.

    Random search for hyper-parameter optimization

    Journal of Machine Learning Research

    (2011)
  • J.C. Chung, K. Gulcehre Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling....
  • A. Graves, A. Mohamed, G. Hinton, Speech Recognition with deep recurrent neural networks. arXiv:1303.5778,...
  • G. Hinton et al.

    A fast learning algorithm for deep belief nets

    Neural Computation

    (2006)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Computation

    (1997)
  • K. Narendra et al.

    Gradient methods for optimization of dynamical systems containing neural networks

    IEEE Transactions od Neural Networks

    (1991)
There are more references available in the full text version of this article.

Cited by (22)

  • An automatic selection of optimal recurrent neural network architecture for processes dynamics modelling purposes[Formula presented]

    2022, Applied Soft Computing
    Citation Excerpt :

    In this paper, a methodology based on artificial neural networks (ANNs) and their recurrent implementation, which are well-known to be universal approximators of non-linear dynamic systems, will be further considered [6–10]. In recent years, many interesting applications of ANN have been found in the literature, include classification tasks [11–13], image (pattern) recognition [14–16], smell recognition [17,18], speech recognition [19], text generation [20], prediction purposes [21–23], modelling and control of dynamic systems [6,8,24,25], state estimation, generating control signals and operating as diagnostic systems [26–28], fractional order operators approximations [29], and many others, e.g., [30,31]. A common feature of practically all ANNs applications is the need to select their architecture (optimal structure/topology) so that their performance, in terms of accuracy and time complexity, will be as high as is possible and satisfactory for the user.

  • Deeppipe: Theory-guided LSTM method for monitoring pressure after multi-product pipeline shutdown

    2021, Process Safety and Environmental Protection
    Citation Excerpt :

    Consequently, in this paper, a data-driven model is selected for the shutdown pressure prediction which is based on the time-series data of pipeline shutdown in SCADA. Noticed that LSTM is a kind of recurrent neural network (RNN), which not only shows excellent performance in time-series prediction as RNN but also overcomes gradient exploding and gradient disappearance problems found in traditional RNN (Yu et al., 2021). As such, LSTM is selected as the prediction model to fit the data relation between different time steps.

View all citing articles on Scopus

Wen Yu received the B.S. degree from Tsinghua University, Beijing, China in 1990 and the M.S. and Ph.D. degrees, both in Electrical Engineering, from Northeastern University, Shenyang, China, in 1992 and 1995, respectively. Since 1996, he has been with the National Polytechnic Institute (CINVESTAV-IPN), Mexico City, Mexico, where he is currently a professor and department chair of the Automatic Control Department. From 2002 to 2003, he held research positions with the Mexican Institute of Petroleum. He was a Senior Visiting Research Fellow with Queen’s University Belfast, Belfast, U.K., from 2006 to 2007, and a Visiting Associate Professor with the University of California, Santa Cruz, from 2009 to 2010. He gas published more than 100 research papers in reputed journals. His Google Scholar h-index is 33, the citation number is 4100. He serves as associate editors of IEEE Transactions on Cybernetics, Neurocomputing, and Journal of Intelligent and Fuzzy Systems. He is a member of the Mexican Academy of Sciences.

Jesus Gonzalez received the B.S. degree from ESIME, Instituto Politecnico Nacional, Mexico City, Mexico, in 2003, and the M.S. degree in automatic control in 2011 from CINVESTAV, Instituto Politecnico Nacional, where he is currently working toward the Ph.D. degree. His current research interests include robot control, artificial neural networks, and fuzzy systems.

Xiaoou Li received the B.S. and the Ph.D. degree in applied mathematics and electrical engineering from Northeastern University, China, in 1991 and 1995. From 1995 to 1997, she was a lecturer of electrical engineering at the Department of Automatic Control of Northeastern University, China. From 1998 to 1999, she was an associate professor of computer science at Centro de Instrumentos-UNAM. Since 2000, she has been a professor of computer science at Sección de Computación, Departamento de Ingenierıa Eléctrica, CINVESTAV-IPN, México. Her research interests include Petri net theory and application, neural networks, advanced database systems, computer integrated manufacturing, and discrete event systems.

View full text