Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems

Panteleev, A. V.; Lobanov, A. V.

doi:10.1134/S0005117920110065

Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems

topical issue
Published: 13 December 2020

Volume 81, pages 2026–2045, (2020)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

A. V. Panteleev¹ &
A. V. Lobanov¹

241 Accesses
7 Citations
Explore all metrics

Abstract

A possible method for estimating the unknown parameters of dynamic models described by differential-algebraic equations is considered. The parameters are estimated using the observations of a mathematical model. The parameter values are found by minimizing a criterion written as the sum of the squared deviations of the values of the state vector’s coordinates from their exact counterparts obtained through measurements at different time instants. Parallelepiped-type constraints are imposed on the parameter values. For solving the optimization problem, a mini-batch method of adaptive random search is proposed, which further develops the ideas of optimization methods used in machine learning. This method is applied for solving three model problems, and the results are compared with those obtained by gradient optimization methods of machine learning procedures and also with those obtained by metaheuristic algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of the Zero-Order Mini-Batch Optimization Method in the Tracking Control Problem

Application of Mini-Batch Adaptive Optimization Method in Stochastic Control Problems

Mini-batch learning of exponential family finite mixture models

Article 10 January 2020

Hien D. Nguyen, Florence Forbes & Geoffrey J. McLachlan

References

Bard, Y. Nonlinear Parameter Estimation. (Academic, New York, 1974).
MATH Google Scholar
Ivchenko, G. I. & Medvedev, Yu. I. Vvedenie v matematicheskuyu statistiku (Introduction to Mathematical Statistics). (Librokom, Moscow, 2014).
Google Scholar
Stewart, W. E., Caracotsios, M. & Sorensen, J. P. Parameter Estimation from Multiresponse Data. AIChE J. 38(no. 5), 641–650 (1992).
Article Google Scholar
Biegler, L.T.Optimization Algorithms for Parameter Estimation and Data Reconciliation. Carnegie Mellon Center. http://numero.cheme.cmu.edu/content/06606/Parestnotes.pdf.
Csendes, T. Nonlinear Parameter Estimation by Global Opitmization–Efficiency and Reliability. Acta Cybern. 8(no. 4), 361–372 (1988).
MathSciNet MATH Google Scholar
Arora, N. & Bieglera, L. T. Trust Region SQP Algorithm for Equality Constrained Parameter Estimation with Simple Parameter Bounds. Comput. Optim. Appl. no. 28, 51–86 (2004).
Article MathSciNet Google Scholar
Floudas, C.A., Pardalos, P.M., Adjimann, C.S., Esposito, W.R., Gumus, Z.H., Harding, S.T., and Schweiger, C.A.Handbook of Test Problems in Local and Global Optimization, New York: Springer, vol. 67. https://titan.princeton.edu/TestProblems.
Tjoa, I.B. and Biegler, L.T.Simultaneous Solution and Optimization Strategies for Parameter of Differential–algebraic Eq. Systems, Ind. Eng. Chem. Res., 1991, vol. 30, no. 2, pp. 376–385. https://doi.org/10.1021/ie00050a015
Bock, H.G.Numerical Treatment of Inverse Problems in Differential and Integral equations, in Recent Advances in Parameter Identification Techniques in ODE, Deuflhard, P. and Hairer, E., Eds., Boston: Birkhauser, 1983, pp. 95–121.
Panteleev, A. V., Letova, T. A. & Pomazueva, E. A. Parametric Design of Optimal in Average Fractional-Order PID Controller in Flight Control Problem. Autom. Remote Control 79(no. 1), 153–166 (2018).
Article MathSciNet Google Scholar
Esposito, W. R. & Floudas, C. A. Global Optimization for the Parameter Estimation of Differential-Algebraic Systems. Ind. Eng. Chem. Res. 39, 1291–1310 (2000).
Article Google Scholar
Osborne, M. R. On Estimation Algorithms for Ordinary Differential Equations. ANZIAM J. no. 50, 107–120 (2008).
Article MathSciNet Google Scholar
Adjiman, C. S., Androulakis, I. P., Floudas, C. A. & Neumaier, A. A. Global Optimization Method for General Twice-Differentiable NLPs, II. Implementation and Computational Results. Comput. Chem. Eng. 22(no. 9), 1159 (1998).
Article Google Scholar
Cizniar, M., Podmajersky, M., Hirmajer, T. & Fikar, M. Global Optimization for Parameter Estimation of Differential-algebraic Systems. CHEM PAP 63(no. 3), 274–283 (2009).
Article Google Scholar
Encyclopedia of Optimization, Floudas, C.A. and Pardalos, P.M., Eds., New York: Springer, 2009.
Glover, F. W. & Kochenberger, G. A. Handbook of Metaheuristics. (Kluwer, Boston, 2003).
Book Google Scholar
Panteleev, A. V. & Kryuchkov, A. Yu Metaheuristic Methods of Optimization in Parameter Estimation of Dynamic Systems. Nauchn. Vestn. Mosk. Gos. Tekh. Univ. Grazhd. Aviats. 20(no. 2), 37–45 (2017).
Google Scholar
Ruder, S., An Overview of Gradient Descent Optimization Algorithms. arXiv:1609.04747v2[cs.LG], June 15, 2017.
Karpathy, A., APeekat Trends in Machine Learning. https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106
Sra, S., Nowozin, S. & Wright, S. J. Optimization for Machine Learning. (MIT Press, Boston, 2012).
Google Scholar
Panteleev, A. V. & Lobanov, A. V. Gradient Optimization Methods in Machine Learning for the Identification of Dynamic Systems Parameters. Modelirov. Analiz Dannykh no. 4, 88–99 (2019).
Google Scholar
Panteleev, A. V. & Letova, T. A. Metody optimizatsii (Optimization Methods). (Logos, Moscow, 2011).
Google Scholar
Sinitsyn, I. N. & Sinitsyn, V. I. Conditionally Optimal Linear Estimation of Normal Processes in Volterra Stochastic Systems. Sist. Sredstva Inform. 29(no. 3), 16–28 (2019).
Google Scholar
Sinitsyn, I. N. & Sinitsyn, V. I. Analytical Modeling of Processes in Volterra Stochastic Systems by the Canonical Expansions Method. Sist. Sredstva Infort. 29(no. 1), 109–127 (2019).
Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Aviation Institute (National Research University), Moscow, Russia
A. V. Panteleev & A. V. Lobanov

Authors

A. V. Panteleev
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Lobanov
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

A. Stochastic Gradient Descent (SGD):

$${\theta }^{k+1}={\theta }^{k}-{\alpha }_{k}{\nabla }_{\theta }\ L({\theta }^{k},\hat{x}({t}_{j}),{t}_{j})={\theta }^{k}-{\alpha }_{k}{\nabla }_{\theta }{\underbrace{\left[\mathop{\sum }\limits_{i = 1}^{n}{({\hat{x}}_{i}({t}_{j})-{x}_{i}(\theta ,{t}_{j}))}^{2}\right]}_{L({\theta }^{k},\hat{x}({t}_{j}),{t}_{j})}},$$

where α_k > 0, k = 0, 1, …, is the step value; t_j is a random time instant on the set T, selected at each kth iteration over again; ∇_θ denotes the gradient with respect to the parameter vector.

B. Classical Momentum (ClassMom):

$$\begin{array}{cc}{\theta }^{k+1}={\theta }^{k}-{\alpha }_{k}{v}^{k},\\ {v}^{k+1}=\beta {v}^{k}+(1-\beta )\ {\nabla }_{\theta }L({\theta }^{k},\hat{x}({t}_{j}),{t}_{j}),\end{array}$$

where v⁰ = o is a zero column vector and β = 0.9.

C. Nesterov Accelerated Gradient (NAG) for solving the problem $f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)$:

Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1), (e.g., γ = 0.9); the learning rate η; an initial point x⁰ ∈ Rⁿ; v⁰ = o; ε₁ > 0.

Set k = 0.

Step 2. Set k = k + 1, and execute:

$${y}^{k}={x}^{k}-\gamma {v}^{k-1},\quad {g}^{k}=\nabla {f}^{k}({y}^{k}),\quad {v}^{k}=\gamma {v}^{k-1}+\eta {g}^{k}.$$

Step 3. Calculate x^k = x^k−1 − v^k.

Step 4. Check the condition $\left\Vert {x}^{k}-{x}^{k-1}\right\Vert <{\varepsilon }_{1}$.

If this condition is satisfied, then x^* = x^k. Otherwise, return to Step 2.

D. Adaptive Gradient Method (AdaGrad) for solving the problem $f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)$.

Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1) (e.g., γ = 0.9); the learning rate η (as a rule η = 0.01); an initial point x⁰ ∈ Rⁿ; the smoothing parameter ε = 10⁻⁶ ÷ 10⁻⁸ (also called the fuzz factor); ε₁ > 0; G⁻¹ = o.

Set k = 0.

Step 2. Set

$${g}^{k}=\nabla {f}^{k}({x}^{k});\quad {G}^{k}={G}^{k-1}+{g}^{k}\odot {g}^{k},$$

where ⊙ is the Hadamard product of matrices.

Step 3. Calculate

$${x}^{k+1}={x}^{k}-\eta {g}^{k}\oslash \sqrt{{G}^{k}+\varepsilon },$$

where ⊘ is the operation of element-wise matrix division.

Step 4. Check the condition $\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}$.

If this condition is satisfied, then x^* = x^k+1. Otherwise set k = k + 1, and return to Step 2.

E. Root Mean Square Propagation (RMSProp) for solving the problem $f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)$.

Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1) (e.g., γ = 0.9); an initial point x⁰ ∈ Rⁿ; the smoothing parameter ε = 10⁻⁶ ÷ 10⁻⁸; ε₁ > 0; the step value η (as a rule, η = 0.001); M⁻¹ = o.

Set k = 0.

Step 2. Set g^k = ∇ f^k(x^k), G^k = g^k ⊙ g^k, and M^k = γM^k−1 + (1 − γ)G^k.

Step 3. Calculate ${x}^{k+1}={x}^{k}-\eta {g}^{k}\oslash \sqrt{{M}^{k}+\varepsilon }.$

Step 4. Check the condition $\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}.$

If this condition is satisfied, then x^* = x^k+1. Otherwise set k = k + 1, and return to Step 2.

F. Adaptive Moment Estimation (Adam) for solving the problem M[f(x)] → min, which has the random samples f¹(x), f²(x), …, f^K(x).

Step 1. Specify the following parameters: the step value α = 0.001; the moment estimation parameters β₁ = 0.9 and β₂ = 0.999; an initial point x⁰ ∈ Rⁿ; the smoothing parameter ε = 10⁻⁸; ε₁ > 0; the initial value m⁰ = o of the first vector of moments M[∇f(x)]; the initial value v⁰ = o of the second vector of moments M[∇f(x) ⊙ ∇f(x)].

Set k = 0.

Step 2. Set k = k + 1,

$$\begin{array}{ccc}{g}^{k}=\nabla {f}^{k}({x}^{k-1});\quad {m}^{k}={\beta }_{1}{m}^{k-1}+(1-{\beta }_{1}){g}^{k};\\ {G}^{k}={g}^{k}\odot {g}^{k};\quad {v}^{k}={\beta }_{2}{v}^{k-1}+(1-{\beta }_{2}){G}^{k};\\ {\hat{m}}^{k}=\frac{{m}^{k}}{1-{{\beta }_{1}}^{k}};\quad {\hat{v}}^{k}=\frac{{v}^{k}}{1-{{\beta }_{2}}^{k}}.\end{array}$$

Step 3. Calculate ${x}^{k}={x}^{k-1}-\alpha {\hat{m}}^{k}\oslash \sqrt{{\hat{v}}^{k}+\varepsilon }$.

Step 4. Check the condition $\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}$.

If this condition is satisfied, then x^* = x^k. Otherwise, return to Step 2.

G. Adam Method Modification (Adamax) for solving the problem M[f(x)] → min, where f(x) ∈ C¹. The problem has the random samples f¹(x), f²(x), …, f^K(x).

Step 1. Specify the following parameters: the step value α = 0.002; the moment estimation parameters β₁ = 0.9 and β₂ = 0.999, β₂ ∈ [0, 1); an initial point x⁰ ∈ Rⁿ; the smoothing parameter ε = 10⁻⁸; ε₁ > 0; the initial value m⁰ = o of the first vector of moments M[∇f(x)]; u⁰ = o.

Set k = 0.

Step 2. Set k = k + 1,

$$\begin{array}{l}{g}^{k}=\nabla {f}^{k}({x}^{k-1});\quad {m}^{k}={\beta }_{1}{m}^{k-1}+(1-{\beta }_{1}){g}^{k},\\ {u}^{k}={\rm{max}}\left\{{\beta }_{2}{u}^{k-1},\left|{g}^{k}\right|\right\}\quad (\,\text{max is element-wise}\,).\end{array}$$

Step 3. Calculate ${x}^{k}={x}^{k-1}-\frac{\alpha }{1-{{\beta }_{1}}^{k}}{m}^{k}\oslash {u}^{k}$.

Step 4. Check the condition $\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}$.

If this condition is satisfied, then x^* = x^k. Otherwise, return to Step 2.

H. Nesterov–accelerated Adaptive Moment Estimation (Nadam).

Step 1. Specify the following parameters: the step value α = 0.002; the moment estimation parameters β₁ = 0.975 and β₂ = 0.999; an initial point x⁰ ∈ Rⁿ; the smoothing parameter ε = 10⁻⁸; the initial value m⁰ = o of the first vector of moments M[ ∇ f(x)]; the initial value v⁰ = o of the second vector of moments M[ ∇ f(x) ⊙ ∇ f(x)].

Set k = 0.

Step 2. Set k = k + 1,

$$\begin{array}{ccc}{g}^{k}=\nabla {f}^{k}({x}^{k-1});\quad {m}^{k}={\beta }_{1}{m}^{k-1}+(1-{\beta }_{1}){g}^{k};\\ {G}^{k}={g}^{k}\odot {g}^{k};\quad {v}^{k}={\beta }_{2}{v}^{k-1}+(1-{\beta }_{2}){G}^{k};\\ {\hat{m}}^{k}=\frac{{\beta }_{1}{m}^{k}}{1-{{\beta }_{1}}^{k+1}}-\frac{(1-{\beta }_{1}){g}^{k}}{1-{{\beta }_{1}}^{k}};\quad {\hat{v}}^{k}=\frac{{\beta }_{2}{v}^{k}}{1-{{\beta }_{2}}^{k}}.\end{array}$$

Step 3. Calculate ${x}^{k}={x}^{k-1}-\alpha {\hat{m}}^{k}\oslash \sqrt{{\hat{v}}^{k}+\varepsilon }$.

Step 4. Check the condition $\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}$.

If this condition is satisfied, then x^* = x^k. Otherwise, return to Step 2.

I. Mini-batch Gradient Descent:

$${\theta }^{k+1}={\theta }^{k}-{\alpha }_{k}{\nabla }_{\theta }\ \bar{Q}({\theta }^{k}),\quad {\alpha }_{k}>0,\quad k=0,1,\ldots ,$$

where α_k is the step value (learning rate),

$$\overline{Q}(\theta )=\frac{1}{m}\sum _{i\in {J}_{m}}{\underbrace{\left[\mathop{\sum }\limits_{j = 1}^{n}{\theta }_{j}{f}_{j}({x}_{i})-{y}_{i}\right]}\limits_{L(\theta ,{x}_{i},{y}_{i})}}^{2}=\frac{1}{m}\sum _{i\in {J}_{m}}L(\theta ,{x}_{i},{y}_{i}),$$

where J_m is the set of m numbers of arbitrary components (x_i, y_i) ∈ X^l of a learning sample (e.g., m consecutive elements). To implement one improvement of the parameters, not the entire dataset, but its small part is required (as a rule, from 50 to 256 components in applications).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panteleev, A., Lobanov, A. Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems. Autom Remote Control 81, 2026–2045 (2020). https://doi.org/10.1134/S0005117920110065

Download citation

Received: 02 March 2020
Revised: 21 May 2020
Accepted: 09 July 2020
Published: 13 December 2020
Issue Date: November 2020
DOI: https://doi.org/10.1134/S0005117920110065

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems

Abstract

Access this article

Similar content being viewed by others

Application of the Zero-Order Mini-Batch Optimization Method in the Tracking Control Problem

Application of Mini-Batch Adaptive Optimization Method in Stochastic Control Problems

Mini-batch learning of exponential family finite mixture models

References

Author information

Authors and Affiliations

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems

Abstract

Access this article

Similar content being viewed by others

Application of the Zero-Order Mini-Batch Optimization Method in the Tracking Control Problem

Application of Mini-Batch Adaptive Optimization Method in Stochastic Control Problems

Mini-batch learning of exponential family finite mixture models

References

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation