Evolutionary echo state network for long-term time series prediction: on the edge of chaos

Zhang, Gege; Zhang, Chao; Zhang, WeiDong

doi:10.1007/s10489-019-01546-w

Evolutionary echo state network for long-term time series prediction: on the edge of chaos

Published: 22 October 2019

Volume 50, pages 893–904, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

843 Accesses
12 Citations
Explore all metrics

Abstract

Quantitative analysis of neural networks is a critical issue to improve their performance. In this paper, we investigate a long-term time series prediction based on the echo state network operating at the edge of chaos. We also assess the eigenfunction of echo state networks and its criticality by the Hermite polynomials. A Hermite polynomial-based activation function design with fast convergence is proposed and the relation between long-term time dependence and edge-of-chaos criticality is given. A new particle swarm optimization-gravitational search algorithm is put forward to improve the parameters estimation that helps attain on the edge of chaos. The method was verified using a chaotic Lorenz system and a real health index data set. The experimental results indicate that evolution makes the reservoir great potential to run on the edge of chaos with rich expression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WOA-Based Echo State Network for Chaotic Time Series Prediction

Article 13 March 2020

Echo state network based on improved fruit fly optimization algorithm for chaotic time series prediction

Article 09 April 2020

Chaotic time series prediction using echo state network based on selective opposition grey wolf optimizer

Article 03 May 2021

References

Quan H, Srinivasan D, Khosravi A (2017) Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans Neural Netw Learn Syst 25(2):303–315
Article Google Scholar
Qin M, Du Z, Du Z (2017) Red tide time series forecasting by combining arima and deep belief network. Knowl-Based Syst 125:39–52
Article Google Scholar
Abaszade M, Effati S (2018) Stochastic support vector regression with probabilistic constraints. Appl Intell 48(1):243–256
Article MATH Google Scholar
Williams RJ, Zipser D (2014) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280
Article Google Scholar
Siegelmann HT, Sontag ED (1995) On the computational power of neural nets. J Comput Syst Sci 50 (1):440–449
Article MathSciNet MATH Google Scholar
Graves A (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103– 111
Jaeger H, Haas H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80
Article Google Scholar
Lai G, Chang W, Yang Y, Liu H (2018) Modeling long- and short-term temporal patterns with deep neural networks. In: The 41st international ACM SIGIR conference on research & development in information retrieval, SIGIR 2018, pp 95–104
Langton CG (1990) Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenom 42(1–3):12–37
Article MathSciNet Google Scholar
Trillos NG, Murray R (2016) A new analytical approach to consistency and overfitting in regularized empirical risk minimization. Eur J Appl Math 28(6):36
MathSciNet MATH Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR
Cinar YG, Mirisaee H, Goswami P, Gaussier E, Aït-Bachir A, Strijov V (2017) Position-based content attention for time series forecasting with sequence-to-sequence rnns. In: International Conference on Neural Information Processing. Springer, pp 533–544
Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018) Geoman: Multi-level attention networks for geo-sensory time series prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp 3428–3434
Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48(10):3797–3806
Article Google Scholar
Yi S, Guo J, Xin L, Kong Q, Guo L, Wang L (2018) Long-term prediction of polar motion using a combined ssa and arma model. J Geodesy 92(3):333–343
Article Google Scholar
Dai C, Pi D (2017) Parameter auto-selection for hemispherical resonator gyroscope’s long-term prediction model based on cooperative game theory. Knowl-Based Syst 134:105– 115
Article Google Scholar
Cannon DM, Goldberg SR (2015) Simple rules for thriving in a complex world, and irrational things like missing socks, pickup lines, and other essential puzzles. J Corporate Account Finance 26(6):97–99
Article Google Scholar
Benmessahel I, Xie K, Chellal M (2018) A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl Intell 48(8):2315–2327
Article Google Scholar
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems, neural information processing systems foundation, Barcelona, Spain, pp 3368–3376
Valdez MA, Jaschke D, Vargas DL, Carr LD (2017) Quantifying complexity in quantum phase transitions via mutual information complex networks. Phys Rev Lett 119(22):225301
Article Google Scholar
Raghu M, Poole B, Kleinberg JM, Ganguli S, Sohl-Dickstein J (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, pp 2847–2854
Mafahim JU, Lambert D, Zare M, Grigolini P (2015) Complexity matching in neural networks. New J Phys 17(1):1–18
Article Google Scholar
Azizipour M, Afshar MH (2018) Reliability-based operation of reservoirs: a hybrid genetic algorithm and cellular automata method. Soft Comput 22(19):6461–6471
Article Google Scholar
Erkaymaz O, Ozer M, Perc M (2017) Performance of small-world feedforward neural networks for the diagnosis of diabetes. Appl Math Comput 311:22–28
MathSciNet MATH Google Scholar
Wang SX, Li M, Zhao L, Jin C (2019) Short-term wind power prediction based on improved small-world neural network. Neural Computing and Applications 31(7):3173–3185
Article Google Scholar
Semwal VB, Gaud N, Nandi G (2019) Human gait state prediction using cellular automata and classification using elm. In: Machine Intelligence and Signal Analysis, Springer, pp 135– 145
Kossio FYK, Goedeke S, Akker BVD, Ibarz B, Memmesheimer RM (2018) Growing critical: Self-organized criticality in a developing neural system, vol 121
Hazan H, Saunders DJ, Sanghavi DT, Siegelmann HT, Kozma R (2018) Unsupervised learning with self-organizing spiking neural networks. In: 2018 International Joint Conference on Neural Networks, IJCNN, pp 1–6
Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
SH L LW (2018) Neural network renormalization group. Phys Rev Lett 121(26):260601
Article Google Scholar
Deng DL, Li X, Sarma SD (2017) Quantum entanglement in neural network states. Physrevx 7(2):021021
Google Scholar
Iso S, Shiba S, Yokoo S (2018) Scale-invariant feature extraction of neural network and renormalization group flow. Physical review E 97(5-1)
Yang G, Schoenholz S (2017a) Mean field residual networks: on the edge of chaos. In: Advances in Neural Information Processing Systems, pp 7103–7114
Yang G, Schoenholz SS (2017b) Mean field residual networks: On the edge of chaos. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp 2865–2873
Kawamoto T, Tsubaki M, Obuchi T (2018) Mean-field theory of graph neural networks in graph partitioning, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS, pp 4366– 4376
Carleo G, Troyer M (2016) Solving the quantum many-body problem with artificial neural networks. Science 355(6325):602–606
Article MathSciNet MATH Google Scholar
Kochjanusz M, Ringel Z (2017) Mutual information, neural networks and the renormalization group. Nat Phys 14(6):578–582
Article Google Scholar
Efthymiou S, Beach MJS, Melko RG (2019) Super-resolving the ising model with convolutional neural networks. Phys Rev B 99:075113
Article Google Scholar
Zhang H, Wang Z, Liu D (2014) A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Trans Neural Netw Learn Syst 25(7):1229–1262
Article Google Scholar
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Article Google Scholar
Njikam ANS, Zhao H (2016) A novel activation function for multilayer feed-forward neural networks. Appl Intell 45(1):75–82
Article Google Scholar
Halmos PR (2012) A Hilbert Space Problem Book, vol 19. Springer Science & Business Media
Petersen A, Muller HG (2016) Functional data analysis for density functions by transformation to a hilbert space. Ann Stat 44(1):183–218
Article MathSciNet MATH Google Scholar
Chen M, Pennington J, Schoenholz SS (2018) Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 872– 881
Gupta C, Jain A, Tayal DK, Castillo O (2018) Clusfude: Forecasting low dimensional numerical data using an improved method based on automatic clustering, fuzzy relationships and differential evolution. Eng Appl of AI 71:175–189
Article Google Scholar
Bianchi FM, Livi L, Alippi C (2018) Investigating echo-state networks dynamics by means of recurrence analysis. IEEE Trans Neural Netw Learn Syst 29(2):427–439
Article MathSciNet Google Scholar
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383
Article Google Scholar
Stanley KO, Clune J, Lehman J, Miikkulainen R (2019) Designing neural networks through neuroevolution. Nat Mach Intell 1(1):24–35
Article Google Scholar
Valdez F, Vȧzquez J C, Melin P, Castillo O (2017) Comparative study of the use of fuzzy logic in improving particle swarm optimization variants for mathematical functions using co-evolution. Appl Soft Comput 52:1070–1083
Article Google Scholar
Soto J, Melin P, Castillo O (2018) A new approach for time series prediction using ensembles of IT2FNN models with optimization of fuzzy integrators. Int J Fuzzy Syst 20(3):701– 728
Article MathSciNet Google Scholar
Radosavljevi J (2016) A solution to the combined economic and emission dispatch using hybrid psogsa algorithm. Appl Artif Intell 30(5):445–474
Article Google Scholar
Olivas F, Valdez F, Melin P, Sombra A, Castillo O (2019) Interval type-2 fuzzy logic for dynamic parameter adaptation in a modified gravitational search algorithm. Inf Sci 476:159– 175
Article Google Scholar
Beilock SL, DeCaro MS (2007) From poor performance to success under stress: Working memory, strategy selection, and mathematical problem solving under pressure. J Exper Psychol Learn Memory Cogn 33(6):983
Article Google Scholar
Mantegna RN, Stanley HE (1994) Stochastic process with ultraslow convergence to a gaussian: The truncated lévy flight. Phys Rev Lett 73(22):2946
Article MathSciNet MATH Google Scholar
Yang G, Pennington J, Rao V, Sohl-Dickstein J, Schoenholz SS (2019) A mean field theory of batch normalization. In: International Conference on Learning Representations
Kreyszig E (1978) Introductory Functional Analysis with Applications. Wiley, New York
MATH Google Scholar
ODonnell R (2013) Analysis of Boolean Functions, vol 9781107038325. Cambridge University Press, Cambridge
Nazemi A, Mortezaee M (2019) A new gradient-based neural dynamic framework for solving constrained min-max optimization problems with an application in portfolio selection models. Applied Intelligence 49(2):396–419
Article Google Scholar

Download references

Acknowledgment

The work was supported by the National Science Foundation of China (61473183, 61521063, 61627810), and National Key R&D Program of China (SQ2017YFGH001005), scientific and technological project in Henan Province(172102210255), CERNET Innovation Project(No. NGII20160517).

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
Gege Zhang, Chao Zhang & WeiDong Zhang
Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, People’s Republic of China
Gege Zhang, Chao Zhang & WeiDong Zhang

Authors

Gege Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
WeiDong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to WeiDong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Approaching equilibrium points by Hermite polynomials

The proofs of all the theorems we quote can be found in Chapter 2 and Chapter 5 of [57] or similar textbooks. A complete normed vector space is called a Banach space. A Hilbert space is a closed subset of a Banach space whose norm is defined by the inner product. Let H be the Hilbert space from S to $\mathbb {R}$. The following Contraction-Mapping Theorem, also known as Banach’s Fixed-Point Theorem describes the existence and uniqueness of solutions of differential equations.

Theorem 1

Let $T:S\rightarrow S$ be a contraction mapping, then the equation Tx = x has only one solution in S, and the unique solution x can be obtained by the limit of the sequence x(n) defined by $x(n) = Tx(n-1),n= 1,2,\dots $, expressed as:

$$ x = \lim_{n\rightarrow \infty} T^{n} x_{0}, $$

(16)

where x₀ is an arbitrary initial element in S.

The theorem not only illustrates the existence and uniqueness of solutions to differential equations, but also provides a way to find solutions by iterative processes. The following is a constructive extension of the theorem.

Lemma 1

If L is a self-adjoint operator, then there is B ≥ A > 0 satisfying

$$ \forall f\in \textbf{\textit{H}}, A\lVert f\rVert^{2} \leq \langle {L}f, f\rangle \leq B\lVert f\rVert^{2}, $$

(17)

then L is invertible, have

$$ \forall f\in \textbf{\textit{H}}, \frac{1}{B}\lVert f\rVert \leq \langle {{L}}^{-1}f, f\rangle \leq \frac{1}{A}\lVert f\rVert^{2}. $$

(18)

The inequality (17) proves that its eigenvalues are between A and B. In a finite dimension, it is diagonalized on an orthonormal basis since L is self-adjoint. It is therefore invertible with eigenvalues between B^− 1 and A^− 1, which proves (18).

Supposing the probabilists’ weight vector $p(x) = e^{-x^{2}/2}$, apply Lemma 1 it follows that the Hermite polynomials are orthogonal on the interval $(-\infty , \infty )$ with respect to the weight function, then we obtain the following important results,

$$ {\int}_{-\infty}^{\infty} H_{m}(x) H_{n}(x) e^{-x^{2}/2}\text{d}x= \left\{\begin{array}{ll} 2^{n} n!\sqrt{\pi},& \text{ for } m = n\\ 0 , &\text{Otherwise}. \end{array}\right. $$

(19)

We use the following facts about the Hermite polynomials (see Chapter 11 in [58]):

$$ H_{n+1}(x)= \frac{x}{\sqrt{n+1}}H_{n}(x)- \sqrt{\frac{n}{n+1}}H_{n-1}(x),\\ $$

(20)

$$ H_{n}^{\prime}(x)=\sqrt{n}H_{n-1}(x), $$

(21)

$$ H_{n}(0)= \left\{\begin{array}{ll} 0, & \text{if } n \text{ is odd};\\ \frac{1}{\sqrt{n!}}(-1)^{\frac{n}{2}}(n-1)!! & \text{if } n \text{ is even}. \end{array}\right. $$

(22)

Appendix B: Examples

Next, we analyze the eigenvalues of several popular activation functions under Hermite polynomials and their effects on the convergence and criticality of neural networks.

1.1 Sigmoid activation

The Sigomoid function is $ \sigma (x)= \frac {1} {e^{-x} + 1} $. Since $H_{0}(x) = 1, H_{1}(x) = x, \text { and } H_{2}(x) = \frac {x^{2}-1}{\sqrt {2}}$, according to the orthogonal Hermite polynomials theorem, Substitute the Sigmoid activation into (19), and get the Hermite coefficients of the first two coefficients, expressed as:

$$ \begin{array}{@{}rcl@{}} a_{0} & =\mathbb{E}_{x\sim N(0,1)} [\sigma(x)]=\frac{1}{2} \end{array} $$

(23)

$$ \begin{array}{@{}rcl@{}} a_{1} &= \mathbb{E}_{x\sim N(0,1)} [\sigma(x)x]=\frac{1}{2\sqrt{2\pi}} \end{array} $$

(24)

For n ≥ 3, we write $g_{n}(x) = \frac {e^{-x^{2}/2}}{(1+e^{-x})}H_{n}(x)$, according to (21), the derivative of g_n(x) is:

$$ \begin{array}{@{}rcl@{}} g_{n}^{\prime} (x) &=& \frac{{{e^{- {x^{2}}{\text{/2}}}}}}{{{e^{- x}} + 1}}(\sqrt n {H_{n - 1}}(x) - x{H_{n}}(x))\\ &&+ \frac{{{e^{- {x^{2}}/2 - x}}}}{{{{({e^{- x}} + 1)}^{2}}}}{H_{n}}(x). \end{array} $$

(25)

Since (25) is equal to 0 as $x\to \infty $, we therefore get:

$$ \begin{array}{@{}rcl@{}} a_{n} & = & \frac{1}{\sqrt{2\pi}}{\int}_{0}^{\infty} \frac{e^{-x^{2}/2}}{(1 + e^{-x})}H_{n}(x) dx\\ & = &\frac{1}{\sqrt{2\pi}}\left( \frac{{{e^{- {x^{2}}{\text{/2}}}}}}{{{e^{- x}} + 1}}\sqrt n {H_{n - 1}}(x) + \frac{{{e^{- {x^{2}}/2 - x}}}}{{{{({e^{- x}} + 1)}^{2}}}}{H_{n}}(x)\right)\left|{~}_{0}^{\infty}\right.\\ & = & \frac{1}{{4\sqrt {2\pi } }}(2\sqrt n {H_{n{\text{ - }}1}}(0) + {H_{n}}\left( 0 \right)). \end{array} $$

(26)

Therefore, the Hermite coefficients of Sigmoid activation is expressed as:

$$ a_{n} = \left\{\begin{array}{ll} \frac{1}{{4\sqrt {2\pi } }}(2\sqrt n {H_{n{\text{ - }}1}}(0) + {H_{n}}\left( 0 \right)) &\text{if } n \geq 3;\\ \frac{1}{2} & \text{if } n= 1;\\ \frac{1}{2\sqrt{2\pi}} &\text{if } n=2. \end{array}\right. $$

(27)

We can find that Sigmoid activation attenuates the values with higher magnitudes. When multiplied by many layers, the overall gradient becomes quite small.

1.2 Normalized ReLU activation

Consider the unit activation ${f}(x) = \sqrt {2}\max \limits (0, x)$. Substitute the Hermite polynomials into the (19) , get the corresponding coefficients:

$$ \begin{array}{@{}rcl@{}} a_{0} & =\mathbb{E}_{x\sim N(0,1)} [{f}(x)]=\frac{1}{\sqrt{\pi}}, \end{array} $$

(28)

$$ \begin{array}{@{}rcl@{}} a_{1} &= \mathbb{E}_{x\sim N(0,1)} [{f}(x)x]=\frac{1}{\sqrt{2}}. \end{array} $$

(29)

For n ≥ 3, we write $g_{n}(x) = x H_{n}(x) e^{-\frac {x^{2}}{2}}$, and its derivative is:

$$ g_{n}^{\prime} (x) = e^{-x^{2}/2} (\sqrt{n} x H_{n - 1}(x) - (x^{2} - 1) H_{n}(x)). $$

(30)

Since the (30) is equal to 0 as $x\to \infty $, we therefore get:

$$ \begin{array}{@{}rcl@{}} a_{n} &=& \frac{1}{\sqrt{\pi}}{\int}_{0}^{\infty} x h_{n}(x)e^{-\frac{x^{2} }{2}} dx\\ &=& \frac{1}{\sqrt{\pi}} e^{-x^{2}/2} (\sqrt{n} x H_{n - 1}(x) - (x^{2} - 1) H_{n}(x))\left|{~}_{0}^{\infty}\right.\\ & =& \frac{1}{\sqrt{\pi}}H_{n}(0). \end{array} $$

(31)

Therefore, the Hermite coefficient of ReLU activation is expressed as:

$$ a_{n} = \left\{\begin{array}{ll} \frac{(n-3)!!}{\sqrt{\pi n!}} & \text{if } n \text{ is even};\\ \frac{1}{\sqrt{2}} & \text{if } n= 1;\\ 0 &\text{if } n \text{ is odd and} \geq 3. \end{array}\right. $$

(32)

The maximum eigenvalue is $\frac {1}{\sqrt {2}}$, then gradually decay to the critical point 0. In particular, (a₀,a₁,a₂, $ a_{3}, a_{4}, a_{5}, a_{6}) = (\frac {1}{\sqrt {\pi }} ,\frac {1}{\sqrt {2}}, \frac {1}{\sqrt {2\pi }}, 0, \frac {1}{\sqrt {24\pi }}, 0, \frac {1}{\sqrt {80\pi }})$. We also see that there are 0 eigenvalues with an interval of 1, which may result in the network do not pass information in saddle points rather than the expected global optimal solution, especially by the gradient descent method [59].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, G., Zhang, C. & Zhang, W. Evolutionary echo state network for long-term time series prediction: on the edge of chaos. Appl Intell 50, 893–904 (2020). https://doi.org/10.1007/s10489-019-01546-w

Download citation

Published: 22 October 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10489-019-01546-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary echo state network for long-term time series prediction: on the edge of chaos

Abstract

Access this article

Similar content being viewed by others

WOA-Based Echo State Network for Chaotic Time Series Prediction

Echo state network based on improved fruit fly optimization algorithm for chaotic time series prediction

Chaotic time series prediction using echo state network based on selective opposition grey wolf optimizer

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Approaching equilibrium points by Hermite polynomials

Theorem 1

Lemma 1

Appendix B: Examples

1.1 Sigmoid activation

1.2 Normalized ReLU activation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evolutionary echo state network for long-term time series prediction: on the edge of chaos

Abstract

Access this article

Similar content being viewed by others

WOA-Based Echo State Network for Chaotic Time Series Prediction

Echo state network based on improved fruit fly optimization algorithm for chaotic time series prediction

Chaotic time series prediction using echo state network based on selective opposition grey wolf optimizer

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Approaching equilibrium points by Hermite polynomials

Theorem 1

Lemma 1

Appendix B: Examples

1.1 Sigmoid activation

1.2 Normalized ReLU activation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation