Reinforcement learning with convolutional reservoir computing

Chang, Hanten; Futagami, Katsuya

doi:10.1007/s10489-020-01679-3

Reinforcement learning with convolutional reservoir computing

Published: 07 March 2020

Volume 50, pages 2400–2410, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hanten Chang¹ &
Katsuya Futagami¹

1443 Accesses
10 Citations
Explore all metrics

Abstract

Recently, reinforcement learning models have achieved great success, mastering complex tasks such as Go and other games with higher scores than human players. Many of these models store considerable data on the tasks and achieve high performance by extracting visual and time-series features using convolutional neural networks (CNNs) and recurrent neural networks respectively. However, these networks have very high computational costs because they need to be trained by repeatedly using the stored data. In this study, we propose a novel practical approach called reinforcement learning with a convolutional reservoir computing (RCRC) model. The RCRC model uses a fixed random-weight CNN and a reservoir computing model to extract visual and time-series features. Using these extracted features, it decides actions with an evolution strategy method. Thereby, the RCRC model has several desirable features: (1) there is no need to train the feature extractor, (2) there is no need to store training data, (3) it can take a wide range of actions, and (4) there is only a single task-dependent weight matrix to be trained. Furthermore, we show the RCRC model can solve multiple reinforcement learning tasks with a completely identical feature extractor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Iqbal H. Sarker

Artificial intelligence in the creative industries: a review

Article Open access 02 July 2021

Nantheera Anantrasirichai & David Bull

A review on the long short-term memory model

Article 13 May 2020

Greg Van Houdt, Carlos Mosquera & Gonzalo Nápoles

References

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller MA (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, van Hasselt H, Silver D (2018) Distributed prioritized experience replay. arXiv:1803.00933
Kapturowski S, Ostrovski G, Dabney W, Quan J, Munos R (2019) Recurrent experience replay in distributed reinforcement learning. In: International conference on learning representations
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine 34(6):26–38
Article Google Scholar
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 AAAI Fall symposium series
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Ha D, Schmidhuber J (2018) Recurrent world models facilitate policy evolution. In: Advances in neural information processing systems. Curran Associates Inc., pp 2450–2462
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. arXiv:1401.4082
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850
Hansen N, Ostermeier A (2001) Completely derandomized self-adaptation in evolution strategies. Evol Comput 9(2):159–195
Article Google Scholar
Hansen N (2016) The CMA evolution strategy: A tutorial. arXiv:1604.00772
Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149
Article Google Scholar
Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 148 (34):13
Google Scholar
Jaeger H, Haas H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80
Tanisaro P, Heidemann G (2016) Time series classification using time warping invariant echo state networks. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 831–836
Ma Q, Shen L, Chen W, Wang J, Wei J, Yu Z (2016) Functional echo state network for time series classification. Inf Sci 373:1–20
Article Google Scholar
Szita I, Gyenes V, Lőrincz A (2006) Reinforcement learning with echo state networks. In: International conference on artificial neural networks. Springer, pp 830–839
Bush K, Anderson C (July 2005) Modeling reward functions for incomplete state representations via echo state networks. In: Proceedings. 2005 IEEE international joint conference on neural networks, 2005, vol 5, pp 2995–3000
Chang H-H, Song H, Yi Y, Zhang J, He H, Liu L (2018) Distributive dynamic spectrum access through deep reinforcement learning: A reservoir computing-based approach. IEEE Internet of Things Journal 6 (2):1938–1948
Article Google Scholar
Tong Z, Tanaka G (2018) Reservoir computing with untrained convolutional neural networks for image recognition. In: 2018 24Th international conference on pattern recognition (ICPR). IEEE, pp 1289–1294
Lukoševičius M (2012) A practical guide to applying echo state networks. In: Neural networks: Tricks of the trade. Springer, pp 659–686
Inubushi M, Yoshimura K (2017) Reservoir computing beyond memory-nonlinearity trade-off. Sci Rep 7 (1):10199
Article Google Scholar
Chang H, Nakaoka S, Ando H (2019) Effect of shapes of activation functions on predictability in the echo state network. arXiv:1905.09419
Verstraeten D, Schrauwen B, d’Haene M, Stroobandt D (2007) An experimental unification of reservoir computing methods. Neural Networks 20(3):391–403
Article Google Scholar
Goudarzi A, Banda P, Lakin MR, Teuscher C, Stefanovic D (2014) A comparative study of reservoir computing for temporal signal processing. arXiv:1401.2224
Klimov O (2016) Carracing-v0 https://gym.openai.com/envs/CarRacing-v0/
Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games. The best paper award. IEEE, Santorini, pp 341–348
Paquette P (2016) Doomtakecover-v0 https://gym.openai.com/envs/DoomTakeCover-v0/
Tallec C, Blier L, Kalainathan D (2018) Reproducing ”world models”. is training the recurrent network really needed ? https://ctallec.github.io/world-models/
Risi S, Stanley KO (2019) Deep neuroevolution of recurrent and discrete world models. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’19. ACM, New York, pp 456–462
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780
Article Google Scholar
LeCun Y (1998) The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540
Wydmuch M, Kempka M, Jaśkowski W (2018) Vizdoom competitions: Playing doom from pixels. IEEE Transactions on Games
Lukosevicius M (2012) A practical guide to applying echo state networks. In: Neural networks: Tricks of the trade
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16(12):2639–2664
Article Google Scholar
Prieur L (2017) Deep-q learning for box2d racecar rl problem
Gerber P, Guan J, Nunez E, Phamdo K, Monsoor T, Malaya N (2018) Solving openai’s car racing environment with deep reinforcement learning and dropout https://github.com/AMD-RIPS/RL-2018/blob/master/documents/nips/nips_2018.pdf
Se WJ, Min J, Lee C (2017) Reinforcement car racing with a3c. https://www.scribd.com/document/358019044/
Khan M, Elibol OH (2018) Car racing using reinforcement learning. https://web.stanford.edu/class/cs221/2017/restricted/p-final/elibol/final.pdf
Gaier A, Ha D (2019) Weight agnostic neural networks. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems, vol 32. Curran Associates, Inc., pp 5365–5379
Massar M, Massar S (2013) Mean-field theory of echo state networks. Physical Review E 87(4):042809
Article MathSciNet Google Scholar
Gallicchio C, Micheli A, Pedrelli L (2017) Deep reservoir computing: a critical experimental analysis. Neurocomputing 268:87–99
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to Takuya Yaguchi for the discussions on reinforcement learning. We also thank Hiroyasu Ando for helping us to improve the manuscript.

Author information

Authors and Affiliations

Division of Policy and Planning Sciences, Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan
Hanten Chang & Katsuya Futagami

Authors

Hanten Chang
View author publications
You can also search for this author in PubMed Google Scholar
Katsuya Futagami
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed to the study conception, design, coding, analysis and trial experiments. The submitted experimental results were performed by Hanten Chang. The first draft of the manuscript was written by both authors and both authors read and approved the final manuscript.

Corresponding author

Correspondence to Hanten Chang.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(TEX 3.84 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, H., Futagami, K. Reinforcement learning with convolutional reservoir computing. Appl Intell 50, 2400–2410 (2020). https://doi.org/10.1007/s10489-020-01679-3

Download citation

Published: 07 March 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10489-020-01679-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning with convolutional reservoir computing

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Artificial intelligence in the creative industries: a review

A review on the long short-term memory model

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Electronic supplementary material

(TEX 3.84 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Artificial intelligence in the creative industries: a review

A review on the long short-term memory model

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Electronic supplementary material

(TEX 3.84 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation