Skip to main content
Log in

Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems

  • topical issue
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

A possible method for estimating the unknown parameters of dynamic models described by differential-algebraic equations is considered. The parameters are estimated using the observations of a mathematical model. The parameter values are found by minimizing a criterion written as the sum of the squared deviations of the values of the state vector’s coordinates from their exact counterparts obtained through measurements at different time instants. Parallelepiped-type constraints are imposed on the parameter values. For solving the optimization problem, a mini-batch method of adaptive random search is proposed, which further develops the ideas of optimization methods used in machine learning. This method is applied for solving three model problems, and the results are compared with those obtained by gradient optimization methods of machine learning procedures and also with those obtained by metaheuristic algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bard, Y. Nonlinear Parameter Estimation. (Academic, New York, 1974).

    MATH  Google Scholar 

  2. Ivchenko, G. I. & Medvedev, Yu. I. Vvedenie v matematicheskuyu statistiku (Introduction to Mathematical Statistics). (Librokom, Moscow, 2014).

    Google Scholar 

  3. Stewart, W. E., Caracotsios, M. & Sorensen, J. P. Parameter Estimation from Multiresponse Data. AIChE J. 38(no. 5), 641–650 (1992).

    Article  Google Scholar 

  4. Biegler, L.T.Optimization Algorithms for Parameter Estimation and Data Reconciliation. Carnegie Mellon Center. http://numero.cheme.cmu.edu/content/06606/Parestnotes.pdf.

  5. Csendes, T. Nonlinear Parameter Estimation by Global Opitmization–Efficiency and Reliability. Acta Cybern. 8(no. 4), 361–372 (1988).

    MathSciNet  MATH  Google Scholar 

  6. Arora, N. & Bieglera, L. T. Trust Region SQP Algorithm for Equality Constrained Parameter Estimation with Simple Parameter Bounds. Comput. Optim. Appl. no. 28, 51–86 (2004).

    Article  MathSciNet  Google Scholar 

  7. Floudas, C.A., Pardalos, P.M., Adjimann, C.S., Esposito, W.R., Gumus, Z.H., Harding, S.T., and Schweiger, C.A.Handbook of Test Problems in Local and Global Optimization, New York: Springer, vol. 67. https://titan.princeton.edu/TestProblems.

  8. Tjoa, I.B. and Biegler, L.T.Simultaneous Solution and Optimization Strategies for Parameter of Differential–algebraic Eq. Systems, Ind. Eng. Chem. Res., 1991, vol. 30, no. 2, pp. 376–385. https://doi.org/10.1021/ie00050a015

  9. Bock, H.G.Numerical Treatment of Inverse Problems in Differential and Integral equations, in Recent Advances in Parameter Identification Techniques in ODE, Deuflhard, P. and Hairer, E., Eds., Boston: Birkhauser, 1983, pp. 95–121.

  10. Panteleev, A. V., Letova, T. A. & Pomazueva, E. A. Parametric Design of Optimal in Average Fractional-Order PID Controller in Flight Control Problem. Autom. Remote Control 79(no. 1), 153–166 (2018).

    Article  MathSciNet  Google Scholar 

  11. Esposito, W. R. & Floudas, C. A. Global Optimization for the Parameter Estimation of Differential-Algebraic Systems. Ind. Eng. Chem. Res. 39, 1291–1310 (2000).

    Article  Google Scholar 

  12. Osborne, M. R. On Estimation Algorithms for Ordinary Differential Equations. ANZIAM J. no. 50, 107–120 (2008).

    Article  MathSciNet  Google Scholar 

  13. Adjiman, C. S., Androulakis, I. P., Floudas, C. A. & Neumaier, A. A. Global Optimization Method for General Twice-Differentiable NLPs, II. Implementation and Computational Results. Comput. Chem. Eng. 22(no. 9), 1159 (1998).

    Article  Google Scholar 

  14. Cizniar, M., Podmajersky, M., Hirmajer, T. & Fikar, M. Global Optimization for Parameter Estimation of Differential-algebraic Systems. CHEM PAP 63(no. 3), 274–283 (2009).

    Article  Google Scholar 

  15. Encyclopedia of Optimization, Floudas, C.A. and Pardalos, P.M., Eds., New York: Springer, 2009.

  16. Glover, F. W. & Kochenberger, G. A. Handbook of Metaheuristics. (Kluwer, Boston, 2003).

    Book  Google Scholar 

  17. Panteleev, A. V. & Kryuchkov, A. Yu Metaheuristic Methods of Optimization in Parameter Estimation of Dynamic Systems. Nauchn. Vestn. Mosk. Gos. Tekh. Univ. Grazhd. Aviats. 20(no. 2), 37–45 (2017).

    Google Scholar 

  18. Ruder, S., An Overview of Gradient Descent Optimization Algorithms. arXiv:1609.04747v2[cs.LG], June 15, 2017.

  19. Karpathy, A., APeekat Trends in Machine Learning. https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106

  20. Sra, S., Nowozin, S. & Wright, S. J. Optimization for Machine Learning. (MIT Press, Boston, 2012).

    Google Scholar 

  21. Panteleev, A. V. & Lobanov, A. V. Gradient Optimization Methods in Machine Learning for the Identification of Dynamic Systems Parameters. Modelirov. Analiz Dannykh no. 4, 88–99 (2019).

    Google Scholar 

  22. Panteleev, A. V. & Letova, T. A. Metody optimizatsii (Optimization Methods). (Logos, Moscow, 2011).

    Google Scholar 

  23. Sinitsyn, I. N. & Sinitsyn, V. I. Conditionally Optimal Linear Estimation of Normal Processes in Volterra Stochastic Systems. Sist. Sredstva Inform. 29(no. 3), 16–28 (2019).

    Google Scholar 

  24. Sinitsyn, I. N. & Sinitsyn, V. I. Analytical Modeling of Processes in Volterra Stochastic Systems by the Canonical Expansions Method. Sist. Sredstva Infort. 29(no. 1), 109–127 (2019).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

A. Stochastic Gradient Descent (SGD):

$${\theta }^{k+1}={\theta }^{k}-{\alpha }_{k}{\nabla }_{\theta }\ L({\theta }^{k},\hat{x}({t}_{j}),{t}_{j})={\theta }^{k}-{\alpha }_{k}{\nabla }_{\theta }{\underbrace{\left[\mathop{\sum }\limits_{i = 1}^{n}{({\hat{x}}_{i}({t}_{j})-{x}_{i}(\theta ,{t}_{j}))}^{2}\right]}_{L({\theta }^{k},\hat{x}({t}_{j}),{t}_{j})}},$$

where αk > 0, k = 0, 1, …,  is the step value; tj is a random time instant on the set T, selected at each kth iteration over again; ∇θ denotes the gradient with respect to the parameter vector.

B. Classical Momentum (ClassMom):

$$\begin{array}{cc}{\theta }^{k+1}={\theta }^{k}-{\alpha }_{k}{v}^{k},\\ {v}^{k+1}=\beta {v}^{k}+(1-\beta )\ {\nabla }_{\theta }L({\theta }^{k},\hat{x}({t}_{j}),{t}_{j}),\end{array}$$

where v0 = o is a zero column vector and β = 0.9. 

C. Nesterov Accelerated Gradient (NAG) for solving the problem \(f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)\):

Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1), (e.g., γ = 0.9); the learning rate η; an initial point x0 ∈ Rn; v0 = o; ε1 > 0.

Set k = 0.

Step 2. Set k = k + 1, and execute:

$${y}^{k}={x}^{k}-\gamma {v}^{k-1},\quad {g}^{k}=\nabla {f}^{k}({y}^{k}),\quad {v}^{k}=\gamma {v}^{k-1}+\eta {g}^{k}.$$

Step 3. Calculate xk = xk−1 − vk.

Step 4. Check the condition \(\left\Vert {x}^{k}-{x}^{k-1}\right\Vert <{\varepsilon }_{1}\).

If this condition is satisfied, then x* = xk. Otherwise, return to Step 2.

D. Adaptive Gradient Method (AdaGrad) for solving the problem \(f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)\).

Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1) (e.g., γ = 0.9); the learning rate η (as a rule η = 0.01); an initial point x0 ∈ Rn; the smoothing parameter ε = 10−6 ÷ 10−8 (also called the fuzz factor); ε1 > 0; G−1 = o.

Set k = 0. 

Step 2. Set

$${g}^{k}=\nabla {f}^{k}({x}^{k});\quad {G}^{k}={G}^{k-1}+{g}^{k}\odot {g}^{k},$$

where  ⊙  is the Hadamard product of matrices.

Step 3. Calculate

$${x}^{k+1}={x}^{k}-\eta {g}^{k}\oslash \sqrt{{G}^{k}+\varepsilon },$$

where  ⊘  is the operation of element-wise matrix division.

Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).

If this condition is satisfied, then x* = xk+1. Otherwise set k = k + 1, and return to Step 2.

E. Root Mean Square Propagation (RMSProp) for solving the problem \(f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)\).

Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1) (e.g., γ = 0.9); an initial point x0 ∈ Rn; the smoothing parameter ε = 10−6 ÷ 10−8ε1 > 0; the step value η (as a rule, η = 0.001); M−1 = o.

Set k = 0. 

Step 2. Set gk = ∇ fk(xk), Gk = gk ⊙ gk, and Mk = γMk−1 + (1 − γ)Gk

Step 3. Calculate \({x}^{k+1}={x}^{k}-\eta {g}^{k}\oslash \sqrt{{M}^{k}+\varepsilon }.\)

Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}.\)

If this condition is satisfied, then x* = xk+1.  Otherwise set k = k + 1, and return to Step 2.

F. Adaptive Moment Estimation (Adam) for solving the problem M[f(x)] → min, which has the random samples f1(x), f2(x), …, fK(x). 

Step 1. Specify the following parameters: the step value α = 0.001; the moment estimation parameters β1 = 0.9 and β2 = 0.999; an initial point x0 ∈ Rn; the smoothing parameter ε = 10−8; ε1 > 0; the initial value m0 = o of the first vector of moments M[∇f(x)]; the initial value v0 = o of the second vector of moments M[∇f(x) ⊙ ∇f(x)]. 

Set k = 0. 

Step 2. Set k = k + 1, 

$$\begin{array}{ccc}{g}^{k}=\nabla {f}^{k}({x}^{k-1});\quad {m}^{k}={\beta }_{1}{m}^{k-1}+(1-{\beta }_{1}){g}^{k};\\ {G}^{k}={g}^{k}\odot {g}^{k};\quad {v}^{k}={\beta }_{2}{v}^{k-1}+(1-{\beta }_{2}){G}^{k};\\ {\hat{m}}^{k}=\frac{{m}^{k}}{1-{{\beta }_{1}}^{k}};\quad {\hat{v}}^{k}=\frac{{v}^{k}}{1-{{\beta }_{2}}^{k}}.\end{array}$$

Step 3. Calculate \({x}^{k}={x}^{k-1}-\alpha {\hat{m}}^{k}\oslash \sqrt{{\hat{v}}^{k}+\varepsilon }\).

Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).

If this condition is satisfied, then x* = xk.  Otherwise, return to Step 2.

G. Adam Method Modification (Adamax) for solving the problem M[f(x)] → min, where f(x) ∈ C1. The problem has the random samples f1(x), f2(x), …, fK(x).

Step 1. Specify the following parameters: the step value α = 0.002; the moment estimation parameters β1 = 0.9 and β2 = 0.999, β2 ∈ [0, 1); an initial point x0 ∈ Rn; the smoothing parameter ε = 10−8; ε1 > 0; the initial value m0 = o of the first vector of moments M[∇f(x)]; u0 = o.

Set k = 0.

Step 2. Set k = k + 1,

$$\begin{array}{l}{g}^{k}=\nabla {f}^{k}({x}^{k-1});\quad {m}^{k}={\beta }_{1}{m}^{k-1}+(1-{\beta }_{1}){g}^{k},\\ {u}^{k}={\rm{max}}\left\{{\beta }_{2}{u}^{k-1},\left|{g}^{k}\right|\right\}\quad (\,\text{max is element-wise}\,).\end{array}$$

Step 3. Calculate \({x}^{k}={x}^{k-1}-\frac{\alpha }{1-{{\beta }_{1}}^{k}}{m}^{k}\oslash {u}^{k}\).

Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).

If this condition is satisfied, then x* = xk.  Otherwise, return to Step 2.

H. Nesterov–accelerated Adaptive Moment Estimation (Nadam).

Step 1. Specify the following parameters: the step value α = 0.002; the moment estimation parameters β1 = 0.975 and β2 = 0.999; an initial point x0 ∈ Rn; the smoothing parameter ε = 10−8; the initial value m0 = o of the first vector of moments M[ ∇ f(x)]; the initial value v0 = o of the second vector of moments M[ ∇ f(x) ⊙ ∇ f(x)].

Set k = 0. 

Step 2. Set k = k + 1,

$$\begin{array}{ccc}{g}^{k}=\nabla {f}^{k}({x}^{k-1});\quad {m}^{k}={\beta }_{1}{m}^{k-1}+(1-{\beta }_{1}){g}^{k};\\ {G}^{k}={g}^{k}\odot {g}^{k};\quad {v}^{k}={\beta }_{2}{v}^{k-1}+(1-{\beta }_{2}){G}^{k};\\ {\hat{m}}^{k}=\frac{{\beta }_{1}{m}^{k}}{1-{{\beta }_{1}}^{k+1}}-\frac{(1-{\beta }_{1}){g}^{k}}{1-{{\beta }_{1}}^{k}};\quad {\hat{v}}^{k}=\frac{{\beta }_{2}{v}^{k}}{1-{{\beta }_{2}}^{k}}.\end{array}$$

Step 3. Calculate \({x}^{k}={x}^{k-1}-\alpha {\hat{m}}^{k}\oslash \sqrt{{\hat{v}}^{k}+\varepsilon }\).

Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).

If this condition is satisfied, then x* = xk.  Otherwise, return to Step 2.

I. Mini-batch Gradient Descent:

$${\theta }^{k+1}={\theta }^{k}-{\alpha }_{k}{\nabla }_{\theta }\ \bar{Q}({\theta }^{k}),\quad {\alpha }_{k}>0,\quad k=0,1,\ldots ,$$

where αk is the step value (learning rate),

$$\overline{Q}(\theta )=\frac{1}{m}\sum _{i\in {J}_{m}}{\underbrace{\left[\mathop{\sum }\limits_{j = 1}^{n}{\theta }_{j}{f}_{j}({x}_{i})-{y}_{i}\right]}\limits_{L(\theta ,{x}_{i},{y}_{i})}}^{2}=\frac{1}{m}\sum _{i\in {J}_{m}}L(\theta ,{x}_{i},{y}_{i}),$$

where Jm is the set of m numbers of arbitrary components (xiyi) ∈ Xl of a learning sample (e.g., m consecutive elements). To implement one improvement of the parameters, not the entire dataset, but its small part is required (as a rule, from 50 to 256 components in applications).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panteleev, A., Lobanov, A. Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems. Autom Remote Control 81, 2026–2045 (2020). https://doi.org/10.1134/S0005117920110065

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0005117920110065

Keywords

Navigation