Abstract
A possible method for estimating the unknown parameters of dynamic models described by differential-algebraic equations is considered. The parameters are estimated using the observations of a mathematical model. The parameter values are found by minimizing a criterion written as the sum of the squared deviations of the values of the state vector’s coordinates from their exact counterparts obtained through measurements at different time instants. Parallelepiped-type constraints are imposed on the parameter values. For solving the optimization problem, a mini-batch method of adaptive random search is proposed, which further develops the ideas of optimization methods used in machine learning. This method is applied for solving three model problems, and the results are compared with those obtained by gradient optimization methods of machine learning procedures and also with those obtained by metaheuristic algorithms.
Similar content being viewed by others
References
Bard, Y. Nonlinear Parameter Estimation. (Academic, New York, 1974).
Ivchenko, G. I. & Medvedev, Yu. I. Vvedenie v matematicheskuyu statistiku (Introduction to Mathematical Statistics). (Librokom, Moscow, 2014).
Stewart, W. E., Caracotsios, M. & Sorensen, J. P. Parameter Estimation from Multiresponse Data. AIChE J. 38(no. 5), 641–650 (1992).
Biegler, L.T.Optimization Algorithms for Parameter Estimation and Data Reconciliation. Carnegie Mellon Center. http://numero.cheme.cmu.edu/content/06606/Parestnotes.pdf.
Csendes, T. Nonlinear Parameter Estimation by Global Opitmization–Efficiency and Reliability. Acta Cybern. 8(no. 4), 361–372 (1988).
Arora, N. & Bieglera, L. T. Trust Region SQP Algorithm for Equality Constrained Parameter Estimation with Simple Parameter Bounds. Comput. Optim. Appl. no. 28, 51–86 (2004).
Floudas, C.A., Pardalos, P.M., Adjimann, C.S., Esposito, W.R., Gumus, Z.H., Harding, S.T., and Schweiger, C.A.Handbook of Test Problems in Local and Global Optimization, New York: Springer, vol. 67. https://titan.princeton.edu/TestProblems.
Tjoa, I.B. and Biegler, L.T.Simultaneous Solution and Optimization Strategies for Parameter of Differential–algebraic Eq. Systems, Ind. Eng. Chem. Res., 1991, vol. 30, no. 2, pp. 376–385. https://doi.org/10.1021/ie00050a015
Bock, H.G.Numerical Treatment of Inverse Problems in Differential and Integral equations, in Recent Advances in Parameter Identification Techniques in ODE, Deuflhard, P. and Hairer, E., Eds., Boston: Birkhauser, 1983, pp. 95–121.
Panteleev, A. V., Letova, T. A. & Pomazueva, E. A. Parametric Design of Optimal in Average Fractional-Order PID Controller in Flight Control Problem. Autom. Remote Control 79(no. 1), 153–166 (2018).
Esposito, W. R. & Floudas, C. A. Global Optimization for the Parameter Estimation of Differential-Algebraic Systems. Ind. Eng. Chem. Res. 39, 1291–1310 (2000).
Osborne, M. R. On Estimation Algorithms for Ordinary Differential Equations. ANZIAM J. no. 50, 107–120 (2008).
Adjiman, C. S., Androulakis, I. P., Floudas, C. A. & Neumaier, A. A. Global Optimization Method for General Twice-Differentiable NLPs, II. Implementation and Computational Results. Comput. Chem. Eng. 22(no. 9), 1159 (1998).
Cizniar, M., Podmajersky, M., Hirmajer, T. & Fikar, M. Global Optimization for Parameter Estimation of Differential-algebraic Systems. CHEM PAP 63(no. 3), 274–283 (2009).
Encyclopedia of Optimization, Floudas, C.A. and Pardalos, P.M., Eds., New York: Springer, 2009.
Glover, F. W. & Kochenberger, G. A. Handbook of Metaheuristics. (Kluwer, Boston, 2003).
Panteleev, A. V. & Kryuchkov, A. Yu Metaheuristic Methods of Optimization in Parameter Estimation of Dynamic Systems. Nauchn. Vestn. Mosk. Gos. Tekh. Univ. Grazhd. Aviats. 20(no. 2), 37–45 (2017).
Ruder, S., An Overview of Gradient Descent Optimization Algorithms. arXiv:1609.04747v2[cs.LG], June 15, 2017.
Karpathy, A., APeekat Trends in Machine Learning. https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106
Sra, S., Nowozin, S. & Wright, S. J. Optimization for Machine Learning. (MIT Press, Boston, 2012).
Panteleev, A. V. & Lobanov, A. V. Gradient Optimization Methods in Machine Learning for the Identification of Dynamic Systems Parameters. Modelirov. Analiz Dannykh no. 4, 88–99 (2019).
Panteleev, A. V. & Letova, T. A. Metody optimizatsii (Optimization Methods). (Logos, Moscow, 2011).
Sinitsyn, I. N. & Sinitsyn, V. I. Conditionally Optimal Linear Estimation of Normal Processes in Volterra Stochastic Systems. Sist. Sredstva Inform. 29(no. 3), 16–28 (2019).
Sinitsyn, I. N. & Sinitsyn, V. I. Analytical Modeling of Processes in Volterra Stochastic Systems by the Canonical Expansions Method. Sist. Sredstva Infort. 29(no. 1), 109–127 (2019).
Author information
Authors and Affiliations
Appendix
Appendix
A. Stochastic Gradient Descent (SGD):
where αk > 0, k = 0, 1, …, is the step value; tj is a random time instant on the set T, selected at each kth iteration over again; ∇θ denotes the gradient with respect to the parameter vector.
B. Classical Momentum (ClassMom):
where v0 = o is a zero column vector and β = 0.9.
C. Nesterov Accelerated Gradient (NAG) for solving the problem \(f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)\):
Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1), (e.g., γ = 0.9); the learning rate η; an initial point x0 ∈ Rn; v0 = o; ε1 > 0.
Set k = 0.
Step 2. Set k = k + 1, and execute:
Step 3. Calculate xk = xk−1 − vk.
Step 4. Check the condition \(\left\Vert {x}^{k}-{x}^{k-1}\right\Vert <{\varepsilon }_{1}\).
If this condition is satisfied, then x* = xk. Otherwise, return to Step 2.
D. Adaptive Gradient Method (AdaGrad) for solving the problem \(f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)\).
Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1) (e.g., γ = 0.9); the learning rate η (as a rule η = 0.01); an initial point x0 ∈ Rn; the smoothing parameter ε = 10−6 ÷ 10−8 (also called the fuzz factor); ε1 > 0; G−1 = o.
Set k = 0.
Step 2. Set
where ⊙ is the Hadamard product of matrices.
Step 3. Calculate
where ⊘ is the operation of element-wise matrix division.
Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).
If this condition is satisfied, then x* = xk+1. Otherwise set k = k + 1, and return to Step 2.
E. Root Mean Square Propagation (RMSProp) for solving the problem \(f({x}^{* })={\rm{min}}_{x\in {{\rm{R}}}^{n}}\ f(x)\).
Step 1. Specify the following parameters: the previous weight update γ, where γ ∈ (0, 1) (e.g., γ = 0.9); an initial point x0 ∈ Rn; the smoothing parameter ε = 10−6 ÷ 10−8; ε1 > 0; the step value η (as a rule, η = 0.001); M−1 = o.
Set k = 0.
Step 2. Set gk = ∇ fk(xk), Gk = gk ⊙ gk, and Mk = γMk−1 + (1 − γ)Gk.
Step 3. Calculate \({x}^{k+1}={x}^{k}-\eta {g}^{k}\oslash \sqrt{{M}^{k}+\varepsilon }.\)
Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}.\)
If this condition is satisfied, then x* = xk+1. Otherwise set k = k + 1, and return to Step 2.
F. Adaptive Moment Estimation (Adam) for solving the problem M[f(x)] → min, which has the random samples f1(x), f2(x), …, fK(x).
Step 1. Specify the following parameters: the step value α = 0.001; the moment estimation parameters β1 = 0.9 and β2 = 0.999; an initial point x0 ∈ Rn; the smoothing parameter ε = 10−8; ε1 > 0; the initial value m0 = o of the first vector of moments M[∇f(x)]; the initial value v0 = o of the second vector of moments M[∇f(x) ⊙ ∇f(x)].
Set k = 0.
Step 2. Set k = k + 1,
Step 3. Calculate \({x}^{k}={x}^{k-1}-\alpha {\hat{m}}^{k}\oslash \sqrt{{\hat{v}}^{k}+\varepsilon }\).
Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).
If this condition is satisfied, then x* = xk. Otherwise, return to Step 2.
G. Adam Method Modification (Adamax) for solving the problem M[f(x)] → min, where f(x) ∈ C1. The problem has the random samples f1(x), f2(x), …, fK(x).
Step 1. Specify the following parameters: the step value α = 0.002; the moment estimation parameters β1 = 0.9 and β2 = 0.999, β2 ∈ [0, 1); an initial point x0 ∈ Rn; the smoothing parameter ε = 10−8; ε1 > 0; the initial value m0 = o of the first vector of moments M[∇f(x)]; u0 = o.
Set k = 0.
Step 2. Set k = k + 1,
Step 3. Calculate \({x}^{k}={x}^{k-1}-\frac{\alpha }{1-{{\beta }_{1}}^{k}}{m}^{k}\oslash {u}^{k}\).
Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).
If this condition is satisfied, then x* = xk. Otherwise, return to Step 2.
H. Nesterov–accelerated Adaptive Moment Estimation (Nadam).
Step 1. Specify the following parameters: the step value α = 0.002; the moment estimation parameters β1 = 0.975 and β2 = 0.999; an initial point x0 ∈ Rn; the smoothing parameter ε = 10−8; the initial value m0 = o of the first vector of moments M[ ∇ f(x)]; the initial value v0 = o of the second vector of moments M[ ∇ f(x) ⊙ ∇ f(x)].
Set k = 0.
Step 2. Set k = k + 1,
Step 3. Calculate \({x}^{k}={x}^{k-1}-\alpha {\hat{m}}^{k}\oslash \sqrt{{\hat{v}}^{k}+\varepsilon }\).
Step 4. Check the condition \(\left\Vert {x}^{k+1}-{x}^{k}\right\Vert <{\varepsilon }_{1}\).
If this condition is satisfied, then x* = xk. Otherwise, return to Step 2.
I. Mini-batch Gradient Descent:
where αk is the step value (learning rate),
where Jm is the set of m numbers of arbitrary components (xi, yi) ∈ Xl of a learning sample (e.g., m consecutive elements). To implement one improvement of the parameters, not the entire dataset, but its small part is required (as a rule, from 50 to 256 components in applications).
Rights and permissions
About this article
Cite this article
Panteleev, A., Lobanov, A. Mini-Batch Adaptive Random Search Method for the Parametric Identification of Dynamic Systems. Autom Remote Control 81, 2026–2045 (2020). https://doi.org/10.1134/S0005117920110065
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117920110065