Elsevier

Neuroscience Research

Volume 156, July 2020, Pages 225-233
Neuroscience Research

Effect of recurrent infomax on the information processing capability of input-driven recurrent neural networks

https://doi.org/10.1016/j.neures.2020.02.001Get rights and content

Highlights

Abstract

Reservoir computing is a framework for exploiting the inherent transient dynamics of recurrent neural networks (RNNs) as a computational resource. On the basis of this framework, much research has been conducted to evaluate the relationship between the dynamics of RNNs and the RNNs’ information processing capability. In this study, we present a detailed analysis of the information processing capability of an RNN optimized by recurrent infomax (RI), an unsupervised learning method that maximizes the mutual information of RNNs by adjusting the connection weights of the network. The results indicate that RI leads to the emergence of a delay-line structure and that the network optimized by the RI possesses a superior short-term memory, which is the ability to store the temporal information of the input stream in its transient dynamics.

Introduction

To elucidate the nature of the central nervous system (CNS), we need to investigate its information processing both experimentally and theoretically. The information processing in the CNS is supported by the dynamics of recurrent networks, whose malfunctioning leads to pathological states such as epilepsy. Because extremely rich dynamics exhibited by recurrent networks make their description complicated, theoretical frameworks capable of giving a clear picture of the information processing in the CNS are needed. A framework called reservoir computing (RC) has been proposed as a brain-inspired information processing model (Maass et al., 2002, Jaeger and Haas, 2004, Lukoševičius and Jaeger, 2009). One of the most notable features of RC is that it exploits the inherent transient dynamics of recurrent neural networks (RNNs) as a computational resource in input-driven RNNs. Owing to this feature, RC has been used as a theoretical framework for examining the information processing mechanism of the central nervous system (Maass et al., 2002, Buonomano and Maass, 2009, Rabinovich et al., 2008). In addition, it has been used to emulate human behavior, such as motor activity (Sussillo and Abbott, 2009, Laje and Buonomano, 2013).

As illustrated in Fig. 1(A), the architecture of RC generally consists of three layers: an input layer, a reservoir layer implemented by an RNN, and an output layer. The framework has three important properties. The first is the ability to embed the previous input information into the transient dynamics of RNNs, which enables real-time information processing of input streams; this is called the short-term memory property. The second property is nonlinearity, which is the ability to emulate nonlinear information processing. The third property is that supervised learning of the reservoir layer is not required if the dynamics of the reservoir layer are sufficient. Only the connection weights from the RNNs to the output, called the readout weights, must be adjusted to learn the required dynamical systems. Although RC is a simple framework, it has been demonstrated to perform well for a large number of machine learning tasks (Antonelo et al., 2008, Jalalvand et al., 2015, Salmen and Ploger, 2005, Skowronski and Harris, 2007, Jaeger, 2003).

The information processing capability of an RC system depends on the inherent dynamics of the reservoir layer. Consequently, much research has been conducted on the information processing capabilities of various dynamical systems using the systems as the reservoir layer. Several studies have investigated the information processing capabilities of physical systems, such as the surface of water (Fernando and Sojakka, 2003), electronic, optoelectronic, and photonic systems (Larger et al., 2012, Appeltant et al., 2011, Woods and Naughton, 2012), ensemble quantum systems (Fujii and Nakajima, 2017), neuromorphic chips and devices (Stieg et al., 2012, Torrejon et al., 2017, Furuta et al., 2018, Tsunegi et al., 2018), and the mechanical bodies of soft robots (Nakajima et al., 2013, Nakajima et al., 2014, Nakajima et al., 2015, Nakajima et al., 2018). These studies exploit the dynamics of physical systems as reservoirs.

Although the inherent dynamics of systems can be exploited as reservoirs, the dynamics of the reservoirs are empirically optimized using the control parameters prior to task-dependent optimization (i.e., optimization of the readout weights). This signifies that RC requires task-independent and task-dependent optimization. One of the best known task-independent optimization approaches involves adjusting a dynamical system to be at the edge of chaos (or the edge of stability), which is the transition point from ordered to chaotic dynamics. Such systems are reported to exhibit superior information processing capabilities (Bertschinger and Natschläger, 2004, Toyoizumi and Abbott, 2011). However, it is not optimal for some tasks (Yildiz et al., 2012, Manjunath and Jaeger, 2013). As a result, various methods should be available for task-independent optimization.

In this study, we examine whether the recurrent infomax (RI) method (Tanaka et al., 2008) can be used for task-independent optimization of a reservoir layer. RI was developed as an extension of feedforward infomax, which was originally proposed by Linsker (1988). It provides an unsupervised learning technique for maximizing information retention in an RNN, which is quantified by mutual information. The RNNs optimized by RI exhibit the dynamical characteristics of neural activities such as cell-assembly-like and synfire-chain-like spontaneous activities as well as critical neuronal avalanches, which are a manifestation of the edge of chaos, in the CNS (Tanaka et al., 2008). Because an RNN optimized by RI replicates the properties of the CNS, it is expected to exhibit improved information processing capabilities. However, whether RI improves the information processing capabilities of a reservoir layer has not yet been investigated, and any improvements should thus be examined and quantified. In this study, we first optimize RNNs using RI and evaluate their information processing capabilities using benchmark tasks.

Section snippets

Model

We consider an input-driven network consisting of N neurons where the state of each neuron xi(t){0,1}(i=1,,N) is updated synchronously and stochastically at discrete time steps. Simulations were performed with N=50 neurons unless otherwise stated. The firing probability of neuron i is determined by its interaction with neuron j(j=1,,N), which is connected with weight Wij and by its interaction with input u(t){0,1}, which is connected with weight Wiin, as follows:p(xi(t+1)=1)=pmax1+exp(

Recurrent infomax

In this section, we briefly describe the RI algorithm (Tanaka et al., 2008). In this study, RI is applied to the connection weights of the RNN and the connection weights from the input to the RNN. The connection weights are represented by solid lines in Fig. 1(A). As illustrated in Fig. 1(B), the connection weights are updated at the end of each block, which consists of 100 000 time steps. The first 50 000 time steps are the washout phase for hi to converge to a steady state. The final 50 000

Information processing capability

Two benchmark tasks are used to evaluate the information processing capability of a network optimized by using RI. Each benchmark task comprises a block consisting of a washout phase (50 000 time steps), learning phase (1500 time steps), and testing phase (1500 time steps) (Table 1, benchmark task block). The washout phase is used to eliminate the influence of the initial state of the network and to converge the bias hi(t) to a steady-state value. The learning phase is used to train readout

Recurrent infomax and information processing capability

In Section 4, we demonstrate that the information processing capability of a network increases at the beginning of RI but peaks at the 1000th block because the connection weights in the RNN become stronger than the input connection weights. Thus, the input information may not be preserved in the network. To address this problem, we attempted to increase the number of input neurons with common input information. However, using K input neurons is virtually the same as using one input neuron and

Discussions

In this study, we used RI to optimize an input-driven RNN and evaluated the information processing capability of the network using short-term memory and Boolean emulation tasks. Although naive RI did not lead to improvements in information processing capability, RI with input multiplicity improved MC and n-bit BC (n=2, 3) because the input connection weights increase preferentially, and the input information is stored in the network. An appropriate input multiplicity optimizes a network for

Acknowledgments

We thank Mr. Hisashi Iwade for his assistance in the numerical simulations of the model. This work was supported by MEXT KAKENHI Grant Number15H05877 andJSPS KAKENHI Grant Numbers16KT0019,18H05472, and19K12184. This work is partially based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

References (41)

  • L.O. Chua et al.

    A nonlinear dynamics perspective of Wolfram’s new kind of science. Part I: threshold of complexity

    Int. J. Bifurc. Chaos

    (2002)
  • J. Dambre et al.

    Information processing capacity of dynamical systems

    Sci. Rep.

    (2012)
  • M.R. Dranias et al.

    Short-term memory in networks of dissociated cortical neurons

    J. Neurosci.

    (2013)
  • C. Fernando et al.

    Pattern recognition in a bucket

    European Conference on Artificial Life

    (2003)
  • K. Fujii et al.

    Harnessing disordered-ensemble quantum dynamics for machine learning

    Phys. Rev. Appl.

    (2017)
  • T. Furuta et al.

    Macromagnetic simulation for reservoir computing utilizing spin dynamics in magnetic tunnel junctions

    Phys. Rev. Appl.

    (2018)
  • S. Ganguli et al.

    Memory traces in dynamical systems

    Proc. Natl. Acad. Sci. U.S.A.

    (2008)
  • H. Jaeger

    Short term memory in echo state networks

    GMD Rep.

    (2002)
  • H. Jaeger

    Adaptive nonlinear system identification with echo state networks

    Adv. Neural Inform. Process. Syst.

    (2003)
  • H. Jaeger et al.

    Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication

    Science

    (2004)
  • View full text