On Optimization Techniques for the Construction of an Exponential Estimate for Delayed Recurrent Neural Networks

Martsenyuk, Vasyl; Rajba, Stanislaw; Karpinski, Mikolaj

doi:10.3390/sym12101731

Open AccessArticle

On Optimization Techniques for the Construction of an Exponential Estimate for Delayed Recurrent Neural Networks

by

Vasyl Martsenyuk

^*,†,‡

,

Stanislaw Rajba

^‡ and

Mikolaj Karpinski

^‡

Department of Computer Science and Automatics, Faculty of Mechanical Engineering and Computer Science, University of Bielsko-Biala, 43-309 Bielsko-Biala, Poland

^*

Author to whom correspondence should be addressed.

^†

Current address: 2 Willowa, 43-309 Bielsko-Biala, Poland.

^‡

These authors contributed equally to this work.

Symmetry 2020, 12(10), 1731; https://doi.org/10.3390/sym12101731

Submission received: 31 August 2020 / Revised: 23 September 2020 / Accepted: 15 October 2020 / Published: 20 October 2020

(This article belongs to the Special Issue Ordinary and Partial Differential Equations: Theory and Applications)

Download Versions Notes

Abstract

:

This work is devoted to the modeling and investigation of the architecture design for the delayed recurrent neural network, based on the delayed differential equations. The usage of discrete and distributed delays makes it possible to model the calculation of the next states using internal memory, which corresponds to the artificial recurrent neural network architecture used in the field of deep learning. The problem of exponential stability of the models of recurrent neural networks with multiple discrete and distributed delays is considered. For this purpose, the direct method of stability research and the gradient descent method is used. The methods are used consequentially. Firstly we use the direct method in order to construct stability conditions (resulting in an exponential estimate), which include the tuple of positive definite matrices. Then we apply the optimization technique for these stability conditions (or of exponential estimate) with the help of a generalized gradient method with respect to this tuple of matrices. The exponential estimates are constructed on the basis of the Lyapunov–Krasovskii functional. An optimization method of improving estimates is offered, which is based on the notion of the generalized gradient of the convex function of the tuple of positive definite matrices. The search for the optimal exponential estimate is reduced to finding the saddle point of the Lagrange function.

Keywords:

recurrent neural network; delayed differential equations; exponential estimation; optimization method; generalized gradient.

1. Introduction

Breakthrough results in the field of deep machine learning are obtained nowadays using recurrent neural networks (RNN). In particular, the construction of machine learning models for problems of image recognition with captioning, natural language processing and translation, was made possible by recurrent neural networks with Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). The paper [1] offers a description of such models using ordinary differential equations. Further research has to be related to the systems with time delays as they are modeling the memory within the network units. In [2] drawing from concepts in signal processing, they formally derived the canonical RNN formulation from differential equations.

Here our study of the RNN model is based on the system with multiple discrete and distributed time-varying delays

\begin{matrix} \dot{x} (t) = & - A x (t) + \sum_{k = 1}^{r_{1}} W_{1, k} g (x (t - h_{k} (t))) + \sum_{m = 1}^{r_{2}} W_{2, m} \int_{t - τ_{m} (t)}^{t} g (x (θ)) d θ, \end{matrix}

(1)

where

x (t) \in R^{n}

is the state vector,

A = d i a g (a_{1}, a_{2}, \dots, a_{n})

is a diagonal matrix with positive entries

a_{i} > 0

. For the i-th neuron

1 / a_{i}

can be interpreted as the activity decay constant (or time constant).

W_{1, k} = {(w_{i j}^{1, k})}_{n \times n}

,

k = \bar{1, r_{1}}

,

W_{2, m} = {(w_{i j}^{2, m})}_{n \times n}

,

m = \bar{1, r_{2}}

are the synaptic connection weight matrices. The entries of

W_{1, k}

and

W_{2, m}

may be positive (excitatory synapses) or negative (inhibitory synapses).

g (x (t)) = {[g_{1} (x (t)), g_{2} (x (t)), \dots, g_{n} (x (t))]}^{⊤} \in R^{n}

is the non-decreasing activation function, which belongs to sector non-linear function class defined by

\begin{matrix} g_{j} (0) = 0 and 0 \leq \frac{g_{j} (ξ_{1}) - g_{j} (ξ_{2})}{ξ_{1} - ξ_{2}} \leq l_{j}, l_{j} > 0, \end{matrix}

(2)

ξ_{1}, ξ_{2} \in R

,

ξ_{1} \neq ξ_{2}

,

j \in \bar{1, n}

and

x = 0

is a fixed point of Equation (1). We let

L = d i a g (l_{1}, l_{2}, \dots, l_{n})

is a diagonal matrix with positive entries

l_{j} > 0

.

The system (1) includes discrete and distributed time-varying delays, which are described with the help of the second and the third terms correspondingly.

The bounded differentiable functions

h_{k} (t)

represent discrete delays of system with

0 \leq h_{k} (t) \leq h_{M, k},

and

{\dot{h}}_{k} (t) \leq h_{D, k} < 1,

(3)

k = \bar{1, r_{1}}

,

t > 0

. Delays

h_{k} (t)

and

τ_{m} (t)

have physical meaning as “controllable memory” if previous states of neurons effects on output only during some time intervals.

h_{M, k}

and

h_{D, k}

are bounds of the delay and its derivative for discrete delays.

The bounded functions

τ_{m} (t)

represent distributed delays of system with

0 \leq τ_{m} (t) \leq τ_{M, m}

,

m = \bar{1, r_{2}}

.

The bounded functions

h_{k} (t)

and

τ_{m} (t)

represent axonal signal transmission delays. The condition (3) for derivative

{\dot{h}}_{k} (t)

will be applied when estimating the upper right derivative of Lyapunov–Krasovskii functional (see, for example, [3]).

The initial conditions associated with system (1) are assumed to be

\begin{matrix} x (s) = & ϕ (s), s \in [- τ_{M}, 0], \\ τ_{M} : = max \{h_{M, k}, k = \bar{1, r_{1}}, τ_{M, m}, m = \bar{1, r_{2}}\}, \end{matrix}

(4)

where

ϕ (s) \in C [- τ_{M}, 0]

.

Given any

ϕ (s) \in C [- τ_{M}, 0]

, under the assumption (2), there exists a unique trajectory of (1) starting from

ϕ

[4].

Here we use the Hopfield neural network model, which includes a diagonal matrix A with positive entries, that shows the self-connection of the neuron. That is the next state of the neuron is dependent on its current state and outputs of eventually all neurons. Such a diagonal matrix is traditionally applied in stability research of continuous-time RNNs [5]. On the other hand, if we used arbitrary matrix A, we would assume the next state of a neuron is dependent on its current state as well as the states of all other neurons, which means that the internal states of all neurons are accessible from outside. It contradicts that, for example, in the case of the LSTM unit only the hidden state vector (also known as the output vector) is seen.

Following the work [2], we may give the interpretation of the model (1) from the viewpoint of signal processing, leading us to “canonical” and “non-canonical” RNNs. Namely, we have

x (t)

, the state signal vector;

g (x (t))

, the readout signal vector, which is a warped version of the state signal vector; the bias parameters are omitted without loss of generality since they can be used in the transformation resulting in the homogeneous system (1). Initial state

ϕ (s)

,

s \in [- τ_{M}, 0]

is considered as an input signal, thus, modeling one-to-many RNN architecture. In more general many-t0-many case, input signal vector

u (t)

,

t > 0

can be used during the work of the RNN as an “input sequence”.

Although RNNs can actually be described using difference equations, it makes sense to consider continuous-time equations that describe the operation of RNNs. This is due to the fact that differential equations make it possible to better describe and understand the dynamic processes that occur. In addition, with the help of differential equations it is possible to explicitly obtain the conditions for stabilization of recurrent neural networks. This is of great importance in the design of recurrent neural networks. Ref. [5] provide a comprehensive review of the research on the continuous-time recurrent neural networks focusing on the stability of Hopfield and Cohen–Grossberg neural networks.

Note also that the corresponding recurrent neural networks can be obtained by discretizing the models based on differential equations. Thus, work [2] shows the way to construct a RNN of the LSTM type starting from the corresponding model based on differential equations with delay and further through the discretization of the so-called canonical RNN.

Recurrent neural network models have been considered from the 1980s, after the pioneering work of Hopfield [6] modeling each neuron as a linear circuit consisting of a resistor and a capacitor. Two approaches can be differed when investigating the models of recurrent neural networks in the class of delayed differential equations. The first way means research of local stability with the help of comparison with the linearised system [7,8,9,10]. The conditions of the Hopf bifurcation were obtained in [10,11]. The second approach (which is called the direct Lyapunov’s method) uses Lyapunov–Krasovkii functionals [3]. It allows us to get stability conditions constructively, which are formulated in the form of linear-matrix inequalities (LMIs). These stability conditions can be improved with the help of optimization of parameters of Lyapunov–Krasovskii functionals.

Exponential estimates of the solutions are very important when investigating the models of RNNs because they show the rate of convergence of calculations when recognizing input data. In the previous works [12,13], indirect method was developed allowing us to get exponential estimates in some general cases of the RNN models. It results in the numerical solution of quasipolynomial equation. It gives us a clear value of exponential decay, which, unfortunately, does not admit optimization and so, cannot be improved. In order to overcome this shortcoming, here we develop an optimization technique, which is based on the direct method for Liapunov–Krasovskii functionals of the special kind.

2. Exponential Estimate

Let

Ω_{n} \subset R^{n \times n}

be a set of all symmetric positive definite matrices. It is an open convex cone because:

(a): convexity—for any $P_{1} \in Ω_{n}$ , $P_{2} \in Ω_{n}$ , $x \in R^{n}$ , and $ξ \in [0, 1]$ we have $x^{⊤} (ξ P_{1} + (1 - ξ) P_{2}) x = ξ x^{⊤} P_{1} x + (1 - ξ) x^{⊤} P_{2} x > 0$ ;
(b): cone—for any $P \in Ω_{n}$ , $x \in R^{n}$ , and $η > 0$ we have $η x^{⊤} P x > 0$ .

{\bar{Ω}}_{n}^{1}

is the part of the cone

Ω_{n}

contained inside the unit sphere, i.e.,

{\bar{Ω}}_{n}^{1} : = {P \in Ω_{n} : ∥ P ∥ \leq 1}

. Here

∥ P ∥

is Frobenius norm of the matrix

P \in Ω_{n}

.

Lemma 1.

Reference [14] For any constant matrix

U \in Ω_{n}

, scalar

β > 0

, vector function

u : [0, β] \to R^{n}

such that the integrations concerned are well-defined, then

(\int_{0}^{β} u^{⊤} (s) d s) U (\int_{0}^{β} u (s) d s) \leq β \int_{0}^{β} u^{⊤} (s) U u (s) d s .

Lemma 2.

Reference [15] Given any real matrices

W_{1}

,

W_{2}

,

W_{3}

with appropriate dimensions and a scalar

β > 0

,

W_{3} \in Ω_{n}

, then the following inequality holds

W_{1}^{⊤} W_{2} + W_{2}^{⊤} W_{1} \leq β W_{1}^{⊤} W_{3} W_{1} + β^{- 1} W_{2}^{⊤} W_{3}^{- 1} W_{2} .

In the following definitions, we assume that the trivial solution of (1) be the unique equilibrium point of the model (1).

Definition 1.

The trivial solution of (1) is globally asymptotically stable if for every solution

x (t)

to the initial value problem (1)–(4), we have

x (t) \to 0

as

t \to \infty

.

Definition 2.

If there exist constants

α > 0

,

K > 0

, and

T > 0

such that every solution

x (t)

to the initial value problem (1)–(4) always satisfies

∥ x (t) ∥ \leq K e^{- α t}

for all

t > T

, then the trivial solution of (1) is said to be globally exponentially stable.

Our research is based on the following Lyapunov–Krasovskii functional, which is an extension of the one offered in the work [3] for the case of multiple delays

\begin{matrix} V [x_{t} (\cdot)] & = e^{2 α t} x^{⊤} (t) P x (t) + \sum_{k = 1}^{r_{1}} \int_{t - h_{k} (t)}^{t} e^{2 α s} g^{⊤} (x (s)) Q_{k} g (x (s)) d s \\ + \sum_{m = 1}^{r_{2}} τ_{M, m} \int_{- τ_{M, m}}^{0} \int_{t + θ}^{t} e^{2 α s} g^{⊤} (x (s)) S_{m} g (x (s)) d s d θ, \end{matrix}

(5)

where unknown constant

α > 0

and matrices P,

Q_{k}

,

k = \bar{1, r_{1}}

,

S_{m}

,

m = \bar{1, r_{2}}

belong to

Ω_{n}

. Here we use traditional denotation of the element of the solution of (1) as the vector-interval

x_{t} (\cdot) : = {x (t + θ) | θ \in [- τ_{M}, 0]} \in C [- τ_{M}, 0]

.

Theorem 1.

We assume that system (1) satisfies the following condition.

H1. Let there exist constant

α > 0

and matrices P,

Q_{k}

,

k = \bar{1, r_{1}}

,

S_{m}

,

m = \bar{1, r_{2}}

, which belong to

relint ({\bar{Ω}}_{n}^{1})

, such that the symmetric matrix

\begin{matrix} Γ : = [\begin{matrix} Γ_{11} & Γ_{12} \\ Γ_{21} & Γ_{22} \end{matrix}], \\ Γ_{11} : = - 2 α P + A^{⊤} P + P A - L (\sum_{k = 1}^{r_{1}} Q_{k} + τ_{M, m}^{2} \sum_{m = 1}^{r_{2}} S_{m}) L \in Ω_{n}, \\ Γ_{12} : = [\frac{e^{α h_{M, 1}}}{\sqrt{1 - h_{D, 1}}} P W_{1, 1} \dots \frac{e^{α h_{M, r_{1}}}}{\sqrt{1 - h_{D, r_{1}}}} P W_{1, r_{1}} e^{α τ_{M, 1}} P W_{2, 1} \dots e^{α τ_{M, r_{2}}} P W_{2, r_{2}}] \in R^{n \times n (r_{1} + r_{2})}, \\ Γ_{21} : = Γ_{12}^{⊤} \in R^{n (r_{1} + r_{2}) \times n}, \\ Γ_{22} : = [\begin{matrix} Q_{1} & Θ \\ ⋱ \\ Q_{r_{1}} \\ S_{1} \\ ⋱ \\ Θ & S_{r_{2}} \end{matrix}] \in Ω_{n (r_{1} + r_{2})}, \\ Θ \in R^{n \times n} is matrix of zeroes, \end{matrix}

(6)

belong to

Ω_{n (1 + r_{1} + r_{2})}

.

Then the trivial solution of (1) is globally asymptotically stable.

Proof.

Estimating right upper derivative of the functional

V [x_{t} (\cdot)]

along the solution of the system (1), we get

\begin{matrix} \frac{d V^{+} [x_{t} (\cdot)]}{d t} \leq e^{2 α t} {x^{⊤} (t) [2 α P - A^{⊤} P - P A] x (t) \\ + [\sum_{k = 1}^{r_{1}} g^{⊤} (x (t - h_{k} (t))) W_{1, k}^{⊤} P x (t) + x^{⊤} (t) P \sum_{k = 1}^{r_{1}} W_{1, k} g (x (t - h_{k} (t)))] \\ + [\sum_{m = 1}^{r_{2}} \int_{t - τ_{m} (t)}^{t} g^{⊤} (x (θ) d θ) W_{2, m}^{⊤} P x (t) + x^{⊤} (t) P \sum_{m = 1}^{r_{2}} W_{2, m} \int_{t - τ_{m} (t)}^{t} g (x (θ)) d θ]} \\ + \sum_{k = 1}^{r_{1}} e^{2 α t} {g^{⊤} (x (t)) Q_{k} g (x (t)) \\ - e^{- 2 α h_{k} (t)} g^{⊤} (x (t - h_{k} (t))) Q_{k} g (x (t - h_{k} (t))) (1 - h_{D, k})} \\ + \sum_{m = 1}^{r_{2}} τ_{M, m} {τ_{M, m} e^{2 α t} g^{⊤} (x (t)) S_{m} g (x (t)) \\ - e^{2 α (t - τ_{M, m})} \int_{t - τ_{m} (t)}^{t} g^{⊤} (x (s)) S_{m} g (x (s)) d s} . \end{matrix}

(7)

Applying Lemmas 1 and 2 for estimating counterparts of (7), we have

\begin{matrix} \sum_{k = 1}^{r_{1}} {g^{⊤} (x (t - h_{k} (t))) W_{1, k}^{⊤} P x (t) + x^{⊤} (t) P W_{1, k} g (x (t - h_{k} (t)))} \\ = \sum_{k = 1}^{r_{1}} [e^{- α h_{M, k}} {(1 - h_{D, k})}^{1 / 2} g^{⊤} (x (t - h_{k} (t)))] [e^{α h_{M, k}} {(1 - h_{D, k})}^{- 1 / 2} W_{1, k}^{⊤} P x (t)] \\ + [e^{α h_{M, k}} {(1 - h_{D, k})}^{- 1 / 2} x^{⊤} (t) P W_{1, k}] [e^{- α h_{M, k}} {(1 - h_{D, k})}^{1 / 2} g (x (t - h_{k} (t)))] \\ \leq \sum_{k = 1}^{r_{1}} {e^{2 α h_{M, k}} {(1 - h_{D, k})}^{- 1} x^{⊤} (t) P W_{1, k} Q_{k}^{- 1} W_{1, k}^{⊤} P x (t) \\ + e^{- 2 α h_{M, k}} (1 - h_{D, k}) g^{⊤} (x (t - h_{k} (t))) Q_{k} g (x (t - h_{k} (t)))}, \end{matrix}

and

\begin{matrix} \sum_{m = 1}^{r_{2}} {\int_{t - τ_{m} (t)}^{t} g^{⊤} (x (θ)) d θ W_{2, m} ⊤ P x (t) + x^{⊤} (t) P W_{2, m} \int_{t - τ_{m} (t)}^{t} g (x (θ)) d θ} \\ = \sum_{m = 1}^{r_{2}} {[e^{- α τ_{M, m}} \int_{t - τ_{m} (t)}^{t} g^{⊤} (x (θ)) d θ] [e^{α τ_{M, m}} W_{2, m}^{⊤} P x (t)] \\ + [e^{α τ_{M, m}} x^{⊤} (t) P W_{2, m}] [\int_{t - τ_{m} (t)}^{t} g (x (θ)) d θ e^{- α τ_{M, m}}]} \\ \leq \sum_{m = 1}^{r_{2}} {e^{- 2 α τ_{M, m}} (\int_{t - τ_{m} (t)}^{t} g^{⊤} (x (θ)) d θ) S_{m} (\int_{t - τ_{m} (t)}^{t} g (x (θ)) d θ) \\ + e^{2 α τ_{M, m}} x^{⊤} (t) P W_{2, m} S_{m}^{- 1} W_{2, m}^{⊤} P x (t)} \\ \leq \sum_{m = 1}^{r_{2}} {τ_{M, m} e^{- 2 α τ_{M, m}} \int_{t - τ_{m} (t)}^{t} g^{⊤} (x (θ)) S_{m} g (x (θ)) d θ \\ + e^{2 α τ_{M, m}} x^{⊤} (t) P W_{2, m} S_{m}^{- 1} W_{2, m}^{⊤} P x (t)} . \end{matrix}

Finally we get

\begin{matrix} \frac{d V^{+} [x_{t} (\cdot)]}{d t} & \leq e^{2 α t} x^{⊤} (t) {2 α P - A^{⊤} P - P A \\ + \sum_{k = 1}^{r_{1}} [e^{2 α h_{M, k}} {(1 - h_{D, k})}^{- 1} P W_{1, k} Q_{k}^{- 1} W_{1, k}^{⊤} P + L Q_{k} L] \\ + \sum_{m = 1}^{r_{2}} [e^{2 α τ_{M, m}} P W_{2, m} S_{m}^{- 1} W_{2, m}^{⊤} P + τ_{M, m}^{2} L S_{m} L]} x (t) \\ \leq - e^{2 α t} x^{⊤} (t) {Γ_{11} - Γ_{12} Γ_{22}^{- 1} Γ_{21}} x (t) \\ = - e^{2 α t} x^{⊤} (t) Γ / Γ_{22} x (t), \end{matrix}

(8)

where

Γ / Γ_{22} : = Γ_{11} - Γ_{12} Γ_{22}^{- 1} Γ_{21}

is the Schur complement of

Γ

in

Γ_{22}

. From the Schur complement it follows that the right side of (8) is negative definite if and only if

Γ \in Ω_{n (1 + r_{1} + r_{2})}

[16]. ☐

Corollary 1.

Provided that the condition H1 holds the trivial solution of (1) is globally exponentially stable as follows

∥ x (t) ∥ \leq {γ (α) | ϕ |}_{τ_{M}} e^{- α t}, t > 0,

(9)

where

\begin{matrix} γ (α) : = & λ_{min}^{- 1 / 2} (P) (λ_{max} (P) + \sum_{k = 1}^{r_{1}} λ_{max} (Q_{k}) l_{max}^{2} \frac{1 - e^{- 2 α h_{M, k}}}{2 α} \\ + \sum_{m = 1}^{r_{2}} τ_{M, m} λ_{max} (L S_{m} L) \frac{2 α τ_{M, m} - 1 + e^{- 2 α τ_{M, m}}}{4 α^{2}})^{1 / 2}, \\ l_{max} : = & max {l_{1}, \dots, l_{n}}, \end{matrix}

λ_{min} (\cdot)

and

λ_{max} (\cdot)

are minimal and maximal eigenvalues of the matrix. Here we use denotions of

∥ \cdot ∥

as Euclidean norm in

R^{n}

and

{| x (\cdot) |}_{τ_{M}} : = {sup}_{s \in [- τ_{M}, 0]} ∥ x (s) ∥

as the uniform convergence norm in

C [τ_{M}, 0]

.

Proof.

Firstly note that the inequality

2 α τ_{M, m} + e^{- 2 α τ_{M, m}} \geq 1,

enables us that the square root expression in

γ (α)

is nonnegative for

α > 0

. From Theorem 1 it follows that

V [x_{t} (\cdot)] \leq V [ϕ (\cdot)]

. Hence, we get

\begin{matrix} λ_{min} (P) {∥ x (t) ∥}^{2} & \leq e^{- 2 α t} V [x_{t} (\cdot)] \leq e^{- 2 α t} V [ϕ (\cdot)] \\ \leq e^{- 2 α t} (ϕ^{⊤} (0) P ϕ (0) + \sum_{k = 1}^{r_{1}} \int_{- h_{M, k}}^{0} e^{2 α s} g^{⊤} (x (s)) Q_{k} g (x (s)) d s \\ + \sum_{m = 1}^{r_{2}} τ_{M, m} \int_{- τ_{M, m}}^{0} \int_{θ}^{0} e^{2 α s} g^{⊤} (ϕ (s)) S_{m} g (ϕ (s)) d s d θ) \\ \leq e^{- 2 α t} (λ_{max} (P) + \sum_{k = 1}^{r_{1}} λ_{max} (Q_{k}) l_{max}^{2} \int_{- h_{M, k}}^{0} e^{2 α s} d s \\ + \sum_{m = 1}^{r_{2}} τ_{M, m} λ_{max} (L S_{m} L) \int_{- τ_{M, m}}^{0} \int_{θ}^{0} e^{2 α s} d s d θ) {| ϕ |}_{τ_{M}}^{2} \\ = e^{- 2 α t} (λ_{max} (P) + \sum_{k = 1}^{r_{1}} λ_{max} (Q_{k}) l_{max}^{2} \frac{1 - e^{- 2 α h_{M, k}}}{2 α} \\ + \sum_{m = 1}^{r_{2}} τ_{M, m} λ_{max} (L S_{m} L) \frac{2 α τ_{M, m} - 1 + e^{- 2 α τ_{M, m}}}{4 α^{2}}) {| ϕ |}_{τ_{M}}^{2} . \end{matrix}

Finally, it yields

λ_{min} {(P) ∥ x (t) ∥}^{2} \leq e^{- 2 α t} γ (α) {| ϕ |}_{τ_{M}}^{2} .

☐

3. Optimization Method

The condition H1 describes the main stability result. The matrix

Γ

presents operator, which is linear with the respect to the tuple of matrices

[P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}]

. At the same time, the dependence of

Γ

on

α

is nonlinear.

α

is the parameter determining the exponential decay rate. Since optimization of

Γ

with the respect to

[P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}]

may be considered as linear matrix inequality, we can reduce the problem of the construction of exponential estimate like (9) to the convex programming. It is natural to assume that “the more positive definite” is the matrix

Γ

, the “more asymptotically stable” is the trivial solution. In turn, the solution is “more exponentially” stable. The positive definiteness of the matrix

Γ

is described with the help of minimum eigenvalue. Thus we result in the following optimization problem.

Further we apply optimization technique developed earlier for linear systems in [17]. Given

α > 0

we search the tuple of matrices

(P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}})

as a solution of the optimization problem

\begin{matrix} [P^{⋆}, Q_{1}^{⋆}, \dots, Q_{r_{1}}^{⋆}, S_{1}^{⋆}, \dots, S_{r_{2}}^{⋆}] = arg inf_{\begin{matrix} [P, Q_{1}, \dots, Q_{r_{1}}, \\ S_{1}, \dots, S_{r_{2}}] \\ \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1} \end{matrix}} ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] . \end{matrix}

(10)

Here

ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] = - λ_{min} (Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}])

.

We give some general conditions of existence of a solution of problem (10).

Definition 3.

The inner product of the tuples of matrices

[P_{1}, Q_{1, 1}, \dots, Q_{r_{1}, 1}, S_{1, 1}, \dots, S_{r_{2}, 1}]

,

[P_{2}, Q_{1, 2}, \dots, Q_{r_{1}, 2}, S_{1, 2}, \dots, S_{r_{2}, 2}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1}

is

\begin{matrix} 〈 [P_{1}, Q_{1, 1}, \dots, Q_{r_{1}, 1}, S_{1, 1}, \dots, S_{r_{2}, 1}], [P_{2}, Q_{1, 2}, \dots, Q_{r_{1}, 2}, S_{1, 2}, \dots, S_{r_{2}, 2}] 〉 \\ : = \sum_{i, j = 1}^{n} (p_{i j}^{1} p_{i j}^{2} + \sum_{k = 1}^{r_{1}} q_{i j, k}^{1} q_{i j, k}^{2} + \sum_{m = 1}^{r_{2}} s_{i j, m}^{1} s_{i j, m}^{2}), \end{matrix}

where

P_{δ} = {p_{i j}^{δ}}

,

Q_{k, δ} = {q_{i j, k}^{δ}}

,

S_{m, δ} = {s_{i j, m}^{δ}}

,

i, j = \bar{1, n}

,

k = \bar{1, r_{1}}

,

m = \bar{1, r_{2}}

,

δ = 1, 2

.

Definition 4.

The generalized gradient of the convex function

ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}]

at the interior point

[P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1}

is the tuple of matrices

[D_{0}, E_{1, 0}, \dots, E_{r_{1}, 0}, F_{1, 0}, \dots, F_{r_{2}, 0}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} R^{n \times n}

such that for all

(P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}) \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1}

we have

\begin{matrix} ψ_{0} & [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] - ψ_{0} [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \\ \geq 〈 [D_{0}, E_{1, 0}, \dots, E_{r_{1}, 0}, F_{1, 0}, \dots, F_{r_{2}, 0}), (P - P_{0}, Q_{1} - Q_{1, 0}, \dots, Q_{r_{1}} - Q_{r_{1}, 0}, S_{1} - S_{1, 0}, \dots, S_{r_{2}} - S_{r_{2}, 0}] 〉 . \end{matrix}

Let

Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}]

be a linear matrix-valued operator that maps the tuple of matrices

[P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1}

into the

n (1 + r_{1} + r_{2}) \times n (1 + r_{1} + r_{2})

symmetric matrices

Γ

.

Denote by

Δ_{i j} \in R^{n \times n}

the matrix in which the entries at positions

(i, j)

and

(j, i)

are units, and all the rest are zeroes.

Lemma 3.

The generalized gradient of the function

ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] = - λ_{min} (Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}])

at the interior point

[P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1}

is the tuple of matrices

[D_{0}, E_{1, 0}, \dots, E_{r_{1}, 0}, F_{1, 0}, \dots, F_{r_{2}, 0}]

, where

D_{0} = {d_{i j}^{0}}

,

E_{k, 0} = {e_{i j, k}^{0}}

,

F_{m, 0} = {f_{i j, m}^{0}}

,

i, j = \bar{1, n}

,

k = \bar{1, r_{1}}

,

m = \bar{1, r_{2}}

such that

\begin{matrix} d_{i j}^{0} = \{\begin{matrix} - z_{0}^{⊤} Γ (Δ_{i j}, Θ_{1}, \dots, Θ_{r_{1} + r_{2}}) z_{0}, i f i = j, \\ - \frac{1}{2} z_{0}^{⊤} Γ (Δ_{i j}, Θ_{1}, \dots, Θ_{r_{1} + r_{2}}) z_{0}, i f i \neq j, \end{matrix} \\ e_{i j, k}^{0} = \{\begin{matrix} - z_{0}^{⊤} Γ (Θ, Θ_{1}, \dots, Θ_{k - 1}, Δ_{i j}, Θ_{k + 1}, \dots, Θ_{r_{1}}, Θ_{r_{1} + 1}, \dots, Θ_{r_{1} + r_{2}}) z_{0}, i f i = j, \\ - \frac{1}{2} z_{0}^{⊤} Γ (Θ, Θ_{1}, \dots, Θ_{k - 1}, Δ_{i j}, Θ_{k + 1}, \dots, Θ_{r_{1}}, Θ_{r_{1} + 1}, \dots, Θ_{r_{1} + r_{2}}) z_{0}, i f i \neq j, \end{matrix} \\ f_{i j, m}^{0} = \{\begin{matrix} - z_{0}^{⊤} Γ (Θ, Θ_{1}, \dots, Θ_{r_{1}}, Θ_{r_{1} + 1}, \dots, Θ_{m - 1}, Δ_{i j}, Θ_{m + 1}, \dots, Θ_{r_{1} + r_{2}}) z_{0}, i f i = j, \\ - \frac{1}{2} z_{0}^{⊤} Γ (Θ, Θ_{1}, \dots, Θ_{r_{1}}, Θ_{r_{1} + 1}, \dots, Θ_{m - 1}, Δ_{i j}, Θ_{m + 1}, \dots, Θ_{r_{1} + r_{2}}) z_{0}, i f i \neq j, \end{matrix} \end{matrix}

(11)

where

Θ_{v}

is matrix Θ at position v,

z_{0}

is the unit eigenvector corresponding to

λ_{min} (Γ (P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}))

.

Proof.

Consider for

z \in R^{n (1 + r_{1} + r_{2})}

\begin{matrix} ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] - ψ_{0} [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \\ = - min_{∥ z ∥ = 1} {z^{⊤} Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] z} + min_{∥ z ∥ = 1} {z^{⊤} Γ [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] z} . \end{matrix}

Assume the first quadratic form reaches its minimal value at

z = z_{1}

, and the second one at

z = z_{0}

. Then subtracting and adding the expression

z_{0}^{⊤} Γ (P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}) z_{0}

, we get

\begin{matrix} ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] - ψ_{0} [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \\ = - z_{0}^{⊤} (Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] - Γ [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}]) z_{0} \\ + z_{0}^{⊤} Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] z_{0} - z_{1}^{⊤} Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] z_{1} . \end{matrix}

Since

z_{0}^{⊤} Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] z_{0} \geq z_{1}^{⊤} Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] z_{1}

, we have

\begin{matrix} ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] - ψ_{0} [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \\ \geq - z_{0}^{⊤} (Γ [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] - Γ [P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}]) z_{0} . \end{matrix}

Finally we use in the last inequality the representation of the matrices in the form

\begin{matrix} P = \sum_{1 \leq i \leq j \leq n} p_{i j} Δ_{i j}, Q_{k} = \sum_{1 \leq i \leq j \leq n} q_{i j, k} Δ_{i j}, S_{m} = \sum_{1 \leq i \leq j \leq n} s_{i j, m} Δ_{i j}, \end{matrix}

and the linearity of

Γ

with respect to

(P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}})

, which allows us to get the presentation of the generalized gradient in the form (11). ☐

When solving (10), we pass from a constrained problem to an unconstrained one. We define the penalty functions

ψ_{1} (B) = λ_{max} (B) - 1, ψ_{2} (B) = - λ_{min} (B), B \in Ω_{n},

and the corresponding Lagrange function

\begin{matrix} L (P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}, u) : = ψ_{0} (P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}) \\ + \sum_{δ = 1, 2} (u_{P, δ} ψ_{δ} (P) + \sum_{k = 1}^{r_{1}} u_{Q_{k}, δ} ψ_{δ} (Q_{k}) + \sum_{m = 1}^{r_{2}} u_{S_{m}, δ} ψ_{δ} (S_{m})), \end{matrix}

(12)

where

u = {u_{P, 1}, u_{Q_{1}, 1}, \dots, u_{Q_{r_{1}}, 1}, u_{S_{1}, 1}, \dots, u_{S_{r_{2}}, 1}, u_{P, 2}, u_{Q_{1}, 2}, \dots, u_{Q_{r_{1}}, 2}, u_{S_{1}, 2}, \dots, u_{S_{r_{2}}, 2}} \in R^{2 (1 + r_{1} + r_{2})}

are non-negative Lagrange multipliers.

Theorem 2.

Provided that the condition H1 holds, the function

ψ_{0} [P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}]

attains its minimum at the point

[P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} {\bar{Ω}}_{n}^{1}

if and only if the point

(P_{0}, Q_{1, 0}, \dots, Q_{r_{1}, 0}, S_{1, 0}, \dots, S_{r_{2}, 0}, u^{0})

, where

\begin{matrix} u^{0} = & {(u_{P, 1}^{0}, u_{Q_{1}, 1}^{0}, \dots, u_{Q_{r_{1}}, 1}^{0}, u_{S_{1}, 1}^{0}, \dots, u_{S_{r_{2}}, 1}^{0}, u_{P, 2}^{0}, u_{Q_{1}, 2}^{0}, \dots, u_{Q_{r_{1}}, 2}^{0}, u_{S_{1}, 2}^{0}, \dots, u_{S_{r_{2}}, 2}^{0})}^{⊤}, \end{matrix}

is a saddle point of the Lagrange function (12).

Proof.

The objective function

ψ_{0} (\cdot)

and the constraint functions

ψ_{1} (\cdot)

,

ψ_{2} (\cdot)

are convex for the matrices

P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}} \in Ω_{n}

. It follows from the convexity of maximum eigenvalue and the concavity of minimum eigenvalue of a symmetric matrix (see Example 3.10 in [18]). When proving the convexity of the function

ψ_{0}

with respect to

[P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} Ω_{n}

we also use the linear dependence of the operator

Γ

on

[P, Q_{1}, \dots, Q_{r_{1}}, S_{1}, \dots, S_{r_{2}}]

.

Due to the Karush–Kuhn–Tucker conditions for convex problems it is left to show that Slater conditions hold [18] (page 244). By virtue the assumption H1 there exists a tuple of matrices

[\bar{P}, {\bar{Q}}_{1}, \dots, {\bar{Q}}_{r_{1}}, {\bar{S}}_{1}, \dots, {\bar{S}}_{r_{2}}] \in \prod_{i = 1}^{1 + r_{1} + r_{2}} Ω_{n} \cap dom (ψ_{0})

such that

ψ_{δ} (\bar{P}) < 0, ψ_{δ} ({\bar{Q}}_{1}) < 0, \dots, ψ_{δ} ({\bar{Q}}_{r_{1}}) < 0, ψ_{δ} ({\bar{S}}_{1}) < 0, \dots, ψ_{δ} ({\bar{S}}_{r_{2}}) < 0, δ = 1, 2,

is satisfied, and the Slater condition is true, which concludes the proof. ☐

4. Conclusions

The work is devoted to modeling and investigation of the architecture design for the delayed recurrent neural network basing on the delayed differential equations. The usage of discrete and distributed delays makes it possible to model the calculation of the next states using internal memory, which corresponds to the artificial recurrent neural network architecture used in the field of deep learning.

The paper proposes a method for constructing an exponential estimate of the solution of a model of a recurrent neural network using the Lyapunov–Krasovskii functional. This estimate is reduced to solving the corresponding linear matrix inequality. This is the most costly operation in terms of the computational complexity, however, this is where we can apply the optimization approach to find the optimal set of matrices from the viewpoint of the exponential estimate.

In contrast to the indirect method of constructing exponential estimates, which was proposed in previous works [12,13], in this study, the method based on the Lyapunov–Krasovskii functional allows the optimization of the estimate within the compact domain of positive definite matrices.

To this end, the concept of a generalized gradient of a convex function on a set of positive definite matrices is introduced. The constructive form of the generalized gradient for the minimal eigenvalue of the matrix

Γ

is presented. The Lagrange function for the unconditional optimization problem is constructed. In this case, the search for the optimal exponential estimate is reduced to finding the saddle point of the Lagrange function.

Author Contributions

Conceptualization, V.M. and M.K.; methodology, S.R.; software, V.M.; validation, V.M., S.R. and M.K.; formal analysis, V.M.; investigation, S.R.; resources, M.K.; data curation, V.M.; writing—original draft preparation, S.R.; writing—review and editing, M.K.; visualization, V.M.; supervision, S.R.; project administration, M.K.; funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the University of Bielsko-Biala.

Acknowledgments

The work is supported by the University of Bielsko-Biala, and the program number is K18/1b/UPBJ/2019-2020.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long Short-Term Memory
GRU	Gated Recurrent Units
RNN	Recurrent Neural Network
DDE	Delayed Differential equarion

References

Habiba, M.; Pearlmutter, B.A. Neural Ordinary Differential Equation based Recurrent Neural Network Model. arXiv 2020, arXiv:2005.09807. [Google Scholar]
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef] [Green Version]
Fang, S.; Jiang, M.; Wang, X. Exponential convergence estimates for neural networks with discrete and distributed delays. Nonlinear Anal. Real World Appl. 2009, 702–714. [Google Scholar] [CrossRef]
Hale, J.K.; Lunel, S.M.V. Introduction to Functional Differential Equations; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 99. [Google Scholar]
Zhang, H.; Wang, Z.; Liu, D. A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 2014, 25, 1229–1262. [Google Scholar] [CrossRef]
Hopfield, J.J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 1984, 81, 3088–3092. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wei, J.; Ruan, S. Stability and bifurcation in a neural network model with two delays. Phys. D Nonlinear Phenom. 1999, 130, 255–272. [Google Scholar] [CrossRef]
Ruan, S.; Wei, J. On the zeros of a third degree exponential polynomial with applications to a delayed model for the control of testosterone secretion. Math. Med. Biol. 2001, 18, 41–52. [Google Scholar] [CrossRef]
Ruan, S.; Wei, J. On the zeros of transcendental functions with applications to stability of delay differential equations with two delays. Dyn. Contin. Discret. Impuls. Syst. Ser. A 2003, 10, 863–874. [Google Scholar]
Yan, X.P.; Li, W.T. Stability and bifurcation in a simplified four-neuron BAM neural network with multiple delays. Discret. Dyn. Nat. Soc. 2006, 2006. [Google Scholar] [CrossRef]
Huang, C.; Huang, L.; Feng, J.; Nai, M.; He, Y. Hopf bifurcation analysis for a two-neuron network with four delays. Chaos Solitons Fractals 2007, 795–812. [Google Scholar] [CrossRef]
Martsenyuk, V. On an indirect method of exponential estimation for a neural network model with discretely distributed delays. Electron. J. Qual. Theory Differ. Equ. 2017, 2017, 1–16. [Google Scholar] [CrossRef]
Martsenyuk, V. Indirect method of exponential convergence estimation for neural network with discrete and distributed delays. Electron. J. Differ. Equ. 2017, 2017, 1–16. [Google Scholar]
Gu, K. An integral inequality in the stability problem of time-delay systems. In Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187), Sydney, Australia, 12–15 December 2000. [Google Scholar] [CrossRef]
Sanchez, E.; Perez, J. Input-to-state stability (ISS) analysis for dynamic neural networks. IEEE Trans. Circuits Syst. Fundam. Theory Appl. 1999, 46, 1395–1398. [Google Scholar] [CrossRef]
Zhang, F. (Ed.) The Schur Complement and Its Applications; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar] [CrossRef]
Khusainov, D.Y.; Martsenyuk, V.P. Optimization method for stability analysis of delayed linear systems. Cybern. Syst. Anal. 1996, 32, 534–538. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martsenyuk, V.; Rajba, S.; Karpinski, M. On Optimization Techniques for the Construction of an Exponential Estimate for Delayed Recurrent Neural Networks. Symmetry 2020, 12, 1731. https://doi.org/10.3390/sym12101731

AMA Style

Martsenyuk V, Rajba S, Karpinski M. On Optimization Techniques for the Construction of an Exponential Estimate for Delayed Recurrent Neural Networks. Symmetry. 2020; 12(10):1731. https://doi.org/10.3390/sym12101731

Chicago/Turabian Style

Martsenyuk, Vasyl, Stanislaw Rajba, and Mikolaj Karpinski. 2020. "On Optimization Techniques for the Construction of an Exponential Estimate for Delayed Recurrent Neural Networks" Symmetry 12, no. 10: 1731. https://doi.org/10.3390/sym12101731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Optimization Techniques for the Construction of an Exponential Estimate for Delayed Recurrent Neural Networks

Abstract

1. Introduction

2. Exponential Estimate

3. Optimization Method

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI