Skip to main content
Log in

Evolutionary echo state network for long-term time series prediction: on the edge of chaos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Quantitative analysis of neural networks is a critical issue to improve their performance. In this paper, we investigate a long-term time series prediction based on the echo state network operating at the edge of chaos. We also assess the eigenfunction of echo state networks and its criticality by the Hermite polynomials. A Hermite polynomial-based activation function design with fast convergence is proposed and the relation between long-term time dependence and edge-of-chaos criticality is given. A new particle swarm optimization-gravitational search algorithm is put forward to improve the parameters estimation that helps attain on the edge of chaos. The method was verified using a chaotic Lorenz system and a real health index data set. The experimental results indicate that evolution makes the reservoir great potential to run on the edge of chaos with rich expression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Quan H, Srinivasan D, Khosravi A (2017) Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans Neural Netw Learn Syst 25(2):303–315

    Article  Google Scholar 

  2. Qin M, Du Z, Du Z (2017) Red tide time series forecasting by combining arima and deep belief network. Knowl-Based Syst 125:39–52

    Article  Google Scholar 

  3. Abaszade M, Effati S (2018) Stochastic support vector regression with probabilistic constraints. Appl Intell 48(1):243–256

    Article  MATH  Google Scholar 

  4. Williams RJ, Zipser D (2014) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280

    Article  Google Scholar 

  5. Siegelmann HT, Sontag ED (1995) On the computational power of neural nets. J Comput Syst Sci 50 (1):440–449

    Article  MathSciNet  MATH  Google Scholar 

  6. Graves A (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  7. Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103– 111

  8. Jaeger H, Haas H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80

    Article  Google Scholar 

  9. Lai G, Chang W, Yang Y, Liu H (2018) Modeling long- and short-term temporal patterns with deep neural networks. In: The 41st international ACM SIGIR conference on research & development in information retrieval, SIGIR 2018, pp 95–104

  10. Langton CG (1990) Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenom 42(1–3):12–37

    Article  MathSciNet  Google Scholar 

  11. Trillos NG, Murray R (2016) A new analytical approach to consistency and overfitting in regularized empirical risk minimization. Eur J Appl Math 28(6):36

    MathSciNet  MATH  Google Scholar 

  12. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR

  13. Cinar YG, Mirisaee H, Goswami P, Gaussier E, Aït-Bachir A, Strijov V (2017) Position-based content attention for time series forecasting with sequence-to-sequence rnns. In: International Conference on Neural Information Processing. Springer, pp 533–544

  14. Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018) Geoman: Multi-level attention networks for geo-sensory time series prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp 3428–3434

  15. Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48(10):3797–3806

    Article  Google Scholar 

  16. Yi S, Guo J, Xin L, Kong Q, Guo L, Wang L (2018) Long-term prediction of polar motion using a combined ssa and arma model. J Geodesy 92(3):333–343

    Article  Google Scholar 

  17. Dai C, Pi D (2017) Parameter auto-selection for hemispherical resonator gyroscope’s long-term prediction model based on cooperative game theory. Knowl-Based Syst 134:105– 115

    Article  Google Scholar 

  18. Cannon DM, Goldberg SR (2015) Simple rules for thriving in a complex world, and irrational things like missing socks, pickup lines, and other essential puzzles. J Corporate Account Finance 26(6):97–99

    Article  Google Scholar 

  19. Benmessahel I, Xie K, Chellal M (2018) A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl Intell 48(8):2315–2327

    Article  Google Scholar 

  20. Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems, neural information processing systems foundation, Barcelona, Spain, pp 3368–3376

  21. Valdez MA, Jaschke D, Vargas DL, Carr LD (2017) Quantifying complexity in quantum phase transitions via mutual information complex networks. Phys Rev Lett 119(22):225301

    Article  Google Scholar 

  22. Raghu M, Poole B, Kleinberg JM, Ganguli S, Sohl-Dickstein J (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, pp 2847–2854

  23. Mafahim JU, Lambert D, Zare M, Grigolini P (2015) Complexity matching in neural networks. New J Phys 17(1):1–18

    Article  Google Scholar 

  24. Azizipour M, Afshar MH (2018) Reliability-based operation of reservoirs: a hybrid genetic algorithm and cellular automata method. Soft Comput 22(19):6461–6471

    Article  Google Scholar 

  25. Erkaymaz O, Ozer M, Perc M (2017) Performance of small-world feedforward neural networks for the diagnosis of diabetes. Appl Math Comput 311:22–28

    MathSciNet  MATH  Google Scholar 

  26. Wang SX, Li M, Zhao L, Jin C (2019) Short-term wind power prediction based on improved small-world neural network. Neural Computing and Applications 31(7):3173–3185

    Article  Google Scholar 

  27. Semwal VB, Gaud N, Nandi G (2019) Human gait state prediction using cellular automata and classification using elm. In: Machine Intelligence and Signal Analysis, Springer, pp 135– 145

  28. Kossio FYK, Goedeke S, Akker BVD, Ibarz B, Memmesheimer RM (2018) Growing critical: Self-organized criticality in a developing neural system, vol 121

  29. Hazan H, Saunders DJ, Sanghavi DT, Siegelmann HT, Kozma R (2018) Unsupervised learning with self-organizing spiking neural networks. In: 2018 International Joint Conference on Neural Networks, IJCNN, pp 1–6

  30. Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics

  31. SH L LW (2018) Neural network renormalization group. Phys Rev Lett 121(26):260601

    Article  Google Scholar 

  32. Deng DL, Li X, Sarma SD (2017) Quantum entanglement in neural network states. Physrevx 7(2):021021

    Google Scholar 

  33. Iso S, Shiba S, Yokoo S (2018) Scale-invariant feature extraction of neural network and renormalization group flow. Physical review E 97(5-1)

  34. Yang G, Schoenholz S (2017a) Mean field residual networks: on the edge of chaos. In: Advances in Neural Information Processing Systems, pp 7103–7114

  35. Yang G, Schoenholz SS (2017b) Mean field residual networks: On the edge of chaos. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp 2865–2873

  36. Kawamoto T, Tsubaki M, Obuchi T (2018) Mean-field theory of graph neural networks in graph partitioning, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS, pp 4366– 4376

  37. Carleo G, Troyer M (2016) Solving the quantum many-body problem with artificial neural networks. Science 355(6325):602–606

    Article  MathSciNet  MATH  Google Scholar 

  38. Kochjanusz M, Ringel Z (2017) Mutual information, neural networks and the renormalization group. Nat Phys 14(6):578–582

    Article  Google Scholar 

  39. Efthymiou S, Beach MJS, Melko RG (2019) Super-resolving the ising model with convolutional neural networks. Phys Rev B 99:075113

    Article  Google Scholar 

  40. Zhang H, Wang Z, Liu D (2014) A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Trans Neural Netw Learn Syst 25(7):1229–1262

    Article  Google Scholar 

  41. Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11

    Article  Google Scholar 

  42. Njikam ANS, Zhao H (2016) A novel activation function for multilayer feed-forward neural networks. Appl Intell 45(1):75–82

    Article  Google Scholar 

  43. Halmos PR (2012) A Hilbert Space Problem Book, vol 19. Springer Science & Business Media

  44. Petersen A, Muller HG (2016) Functional data analysis for density functions by transformation to a hilbert space. Ann Stat 44(1):183–218

    Article  MathSciNet  MATH  Google Scholar 

  45. Chen M, Pennington J, Schoenholz SS (2018) Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 872– 881

  46. Gupta C, Jain A, Tayal DK, Castillo O (2018) Clusfude: Forecasting low dimensional numerical data using an improved method based on automatic clustering, fuzzy relationships and differential evolution. Eng Appl of AI 71:175–189

    Article  Google Scholar 

  47. Bianchi FM, Livi L, Alippi C (2018) Investigating echo-state networks dynamics by means of recurrence analysis. IEEE Trans Neural Netw Learn Syst 29(2):427–439

    Article  MathSciNet  Google Scholar 

  48. Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383

    Article  Google Scholar 

  49. Stanley KO, Clune J, Lehman J, Miikkulainen R (2019) Designing neural networks through neuroevolution. Nat Mach Intell 1(1):24–35

    Article  Google Scholar 

  50. Valdez F, Vȧzquez J C, Melin P, Castillo O (2017) Comparative study of the use of fuzzy logic in improving particle swarm optimization variants for mathematical functions using co-evolution. Appl Soft Comput 52:1070–1083

    Article  Google Scholar 

  51. Soto J, Melin P, Castillo O (2018) A new approach for time series prediction using ensembles of IT2FNN models with optimization of fuzzy integrators. Int J Fuzzy Syst 20(3):701– 728

    Article  MathSciNet  Google Scholar 

  52. Radosavljevi J (2016) A solution to the combined economic and emission dispatch using hybrid psogsa algorithm. Appl Artif Intell 30(5):445–474

    Article  Google Scholar 

  53. Olivas F, Valdez F, Melin P, Sombra A, Castillo O (2019) Interval type-2 fuzzy logic for dynamic parameter adaptation in a modified gravitational search algorithm. Inf Sci 476:159– 175

    Article  Google Scholar 

  54. Beilock SL, DeCaro MS (2007) From poor performance to success under stress: Working memory, strategy selection, and mathematical problem solving under pressure. J Exper Psychol Learn Memory Cogn 33(6):983

    Article  Google Scholar 

  55. Mantegna RN, Stanley HE (1994) Stochastic process with ultraslow convergence to a gaussian: The truncated lévy flight. Phys Rev Lett 73(22):2946

    Article  MathSciNet  MATH  Google Scholar 

  56. Yang G, Pennington J, Rao V, Sohl-Dickstein J, Schoenholz SS (2019) A mean field theory of batch normalization. In: International Conference on Learning Representations

  57. Kreyszig E (1978) Introductory Functional Analysis with Applications. Wiley, New York

    MATH  Google Scholar 

  58. ODonnell R (2013) Analysis of Boolean Functions, vol 9781107038325. Cambridge University Press, Cambridge

  59. Nazemi A, Mortezaee M (2019) A new gradient-based neural dynamic framework for solving constrained min-max optimization problems with an application in portfolio selection models. Applied Intelligence 49(2):396–419

    Article  Google Scholar 

Download references

Acknowledgment

The work was supported by the National Science Foundation of China (61473183, 61521063, 61627810), and National Key R&D Program of China (SQ2017YFGH001005), scientific and technological project in Henan Province(172102210255), CERNET Innovation Project(No. NGII20160517).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to WeiDong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Approaching equilibrium points by Hermite polynomials

The proofs of all the theorems we quote can be found in Chapter 2 and Chapter 5 of [57] or similar textbooks. A complete normed vector space is called a Banach space. A Hilbert space is a closed subset of a Banach space whose norm is defined by the inner product. Let H be the Hilbert space from S to \(\mathbb {R}\). The following Contraction-Mapping Theorem, also known as Banach’s Fixed-Point Theorem describes the existence and uniqueness of solutions of differential equations.

Theorem 1

Let \(T:S\rightarrow S\) be a contraction mapping, then the equation Tx = x has only one solution in S, and the unique solution x can be obtained by the limit of the sequence x(n) defined by \(x(n) = Tx(n-1),n= 1,2,\dots \), expressed as:

$$ x = \lim_{n\rightarrow \infty} T^{n} x_{0}, $$
(16)

where x0 is an arbitrary initial element in S.

The theorem not only illustrates the existence and uniqueness of solutions to differential equations, but also provides a way to find solutions by iterative processes. The following is a constructive extension of the theorem.

Lemma 1

If L is a self-adjoint operator, then there is BA > 0 satisfying

$$ \forall f\in \textbf{\textit{H}}, A\lVert f\rVert^{2} \leq \langle {L}f, f\rangle \leq B\lVert f\rVert^{2}, $$
(17)

then L is invertible, have

$$ \forall f\in \textbf{\textit{H}}, \frac{1}{B}\lVert f\rVert \leq \langle {{L}}^{-1}f, f\rangle \leq \frac{1}{A}\lVert f\rVert^{2}. $$
(18)

The inequality (17) proves that its eigenvalues are between A and B. In a finite dimension, it is diagonalized on an orthonormal basis since L is self-adjoint. It is therefore invertible with eigenvalues between B− 1 and A− 1, which proves (18).

Supposing the probabilists’ weight vector \(p(x) = e^{-x^{2}/2}\), apply Lemma 1 it follows that the Hermite polynomials are orthogonal on the interval \((-\infty , \infty )\) with respect to the weight function, then we obtain the following important results,

$$ {\int}_{-\infty}^{\infty} H_{m}(x) H_{n}(x) e^{-x^{2}/2}\text{d}x= \left\{\begin{array}{ll} 2^{n} n!\sqrt{\pi},& \text{ for } m = n\\ 0 , &\text{Otherwise}. \end{array}\right. $$
(19)

We use the following facts about the Hermite polynomials (see Chapter 11 in [58]):

$$ H_{n+1}(x)= \frac{x}{\sqrt{n+1}}H_{n}(x)- \sqrt{\frac{n}{n+1}}H_{n-1}(x),\\ $$
(20)
$$ H_{n}^{\prime}(x)=\sqrt{n}H_{n-1}(x), $$
(21)
$$ H_{n}(0)= \left\{\begin{array}{ll} 0, & \text{if } n \text{ is odd};\\ \frac{1}{\sqrt{n!}}(-1)^{\frac{n}{2}}(n-1)!! & \text{if } n \text{ is even}. \end{array}\right. $$
(22)

Appendix B: Examples

Next, we analyze the eigenvalues of several popular activation functions under Hermite polynomials and their effects on the convergence and criticality of neural networks.

1.1 Sigmoid activation

The Sigomoid function is \( \sigma (x)= \frac {1} {e^{-x} + 1} \). Since \(H_{0}(x) = 1, H_{1}(x) = x, \text { and } H_{2}(x) = \frac {x^{2}-1}{\sqrt {2}}\), according to the orthogonal Hermite polynomials theorem, Substitute the Sigmoid activation into (19), and get the Hermite coefficients of the first two coefficients, expressed as:

$$ \begin{array}{@{}rcl@{}} a_{0} & =\mathbb{E}_{x\sim N(0,1)} [\sigma(x)]=\frac{1}{2} \end{array} $$
(23)
$$ \begin{array}{@{}rcl@{}} a_{1} &= \mathbb{E}_{x\sim N(0,1)} [\sigma(x)x]=\frac{1}{2\sqrt{2\pi}} \end{array} $$
(24)

For n ≥ 3, we write \(g_{n}(x) = \frac {e^{-x^{2}/2}}{(1+e^{-x})}H_{n}(x)\), according to (21), the derivative of gn(x) is:

$$ \begin{array}{@{}rcl@{}} g_{n}^{\prime} (x) &=& \frac{{{e^{- {x^{2}}{\text{/2}}}}}}{{{e^{- x}} + 1}}(\sqrt n {H_{n - 1}}(x) - x{H_{n}}(x))\\ &&+ \frac{{{e^{- {x^{2}}/2 - x}}}}{{{{({e^{- x}} + 1)}^{2}}}}{H_{n}}(x). \end{array} $$
(25)

Since (25) is equal to 0 as \(x\to \infty \), we therefore get:

$$ \begin{array}{@{}rcl@{}} a_{n} & = & \frac{1}{\sqrt{2\pi}}{\int}_{0}^{\infty} \frac{e^{-x^{2}/2}}{(1 + e^{-x})}H_{n}(x) dx\\ & = &\frac{1}{\sqrt{2\pi}}\left( \frac{{{e^{- {x^{2}}{\text{/2}}}}}}{{{e^{- x}} + 1}}\sqrt n {H_{n - 1}}(x) + \frac{{{e^{- {x^{2}}/2 - x}}}}{{{{({e^{- x}} + 1)}^{2}}}}{H_{n}}(x)\right)\left|{~}_{0}^{\infty}\right.\\ & = & \frac{1}{{4\sqrt {2\pi } }}(2\sqrt n {H_{n{\text{ - }}1}}(0) + {H_{n}}\left( 0 \right)). \end{array} $$
(26)

Therefore, the Hermite coefficients of Sigmoid activation is expressed as:

$$ a_{n} = \left\{\begin{array}{ll} \frac{1}{{4\sqrt {2\pi } }}(2\sqrt n {H_{n{\text{ - }}1}}(0) + {H_{n}}\left( 0 \right)) &\text{if } n \geq 3;\\ \frac{1}{2} & \text{if } n= 1;\\ \frac{1}{2\sqrt{2\pi}} &\text{if } n=2. \end{array}\right. $$
(27)

We can find that Sigmoid activation attenuates the values with higher magnitudes. When multiplied by many layers, the overall gradient becomes quite small.

1.2 Normalized ReLU activation

Consider the unit activation \({f}(x) = \sqrt {2}\max \limits (0, x)\). Substitute the Hermite polynomials into the (19) , get the corresponding coefficients:

$$ \begin{array}{@{}rcl@{}} a_{0} & =\mathbb{E}_{x\sim N(0,1)} [{f}(x)]=\frac{1}{\sqrt{\pi}}, \end{array} $$
(28)
$$ \begin{array}{@{}rcl@{}} a_{1} &= \mathbb{E}_{x\sim N(0,1)} [{f}(x)x]=\frac{1}{\sqrt{2}}. \end{array} $$
(29)

For n ≥ 3, we write \(g_{n}(x) = x H_{n}(x) e^{-\frac {x^{2}}{2}}\), and its derivative is:

$$ g_{n}^{\prime} (x) = e^{-x^{2}/2} (\sqrt{n} x H_{n - 1}(x) - (x^{2} - 1) H_{n}(x)). $$
(30)

Since the (30) is equal to 0 as \(x\to \infty \), we therefore get:

$$ \begin{array}{@{}rcl@{}} a_{n} &=& \frac{1}{\sqrt{\pi}}{\int}_{0}^{\infty} x h_{n}(x)e^{-\frac{x^{2} }{2}} dx\\ &=& \frac{1}{\sqrt{\pi}} e^{-x^{2}/2} (\sqrt{n} x H_{n - 1}(x) - (x^{2} - 1) H_{n}(x))\left|{~}_{0}^{\infty}\right.\\ & =& \frac{1}{\sqrt{\pi}}H_{n}(0). \end{array} $$
(31)

Therefore, the Hermite coefficient of ReLU activation is expressed as:

$$ a_{n} = \left\{\begin{array}{ll} \frac{(n-3)!!}{\sqrt{\pi n!}} & \text{if } n \text{ is even};\\ \frac{1}{\sqrt{2}} & \text{if } n= 1;\\ 0 &\text{if } n \text{ is odd and} \geq 3. \end{array}\right. $$
(32)

The maximum eigenvalue is \(\frac {1}{\sqrt {2}}\), then gradually decay to the critical point 0. In particular, (a0,a1,a2, \( a_{3}, a_{4}, a_{5}, a_{6}) = (\frac {1}{\sqrt {\pi }} ,\frac {1}{\sqrt {2}}, \frac {1}{\sqrt {2\pi }}, 0, \frac {1}{\sqrt {24\pi }}, 0, \frac {1}{\sqrt {80\pi }})\). We also see that there are 0 eigenvalues with an interval of 1, which may result in the network do not pass information in saddle points rather than the expected global optimal solution, especially by the gradient descent method [59].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, G., Zhang, C. & Zhang, W. Evolutionary echo state network for long-term time series prediction: on the edge of chaos. Appl Intell 50, 893–904 (2020). https://doi.org/10.1007/s10489-019-01546-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01546-w

Keywords

Navigation