Towards on-line tuning of adaptive-agent’s multivariate meta-parameter

Kárný, Miroslav

doi:10.1007/s13042-021-01358-w

Towards on-line tuning of adaptive-agent’s multivariate meta-parameter

Original Article
Published: 20 June 2021

Volume 12, pages 2717–2731, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Miroslav Kárný ORCID: orcid.org/0000-0002-7440-6041¹

154 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

A decision-making (DM) agent models its environment and quantifies its DM preferences. An adaptive agent models them locally nearby the realisation of the behaviour of the closed DM loop. Due to this, a simple tool set often suffices for solving complex dynamic DM tasks. The inspected Bayesian agent relies on a unified learning and optimisation framework, which works well when tailored by making a range of case-specific options. Many of them can be made off-line. These options concern the sets of involved variables, the knowledge and preference elicitation, structure estimation, etc. Still, some meta-parameters need an on-line choice. This concerns, for instance, a weight balancing exploration with exploitation, a weight reflecting agent’s willingness to cooperate, a discounting factor, etc. Such options influence, often vitally, DM quality and their adaptive tuning is needed. Specific ways exist, for instance, a data-dependent choice of a forgetting factor serving to tracking of parameter changes. A general methodology is, however, missing. The paper opens a pathway to it. The solution uses a hierarchical feedback exploiting a generic, DM-related, observable, mismodelling indicator. The paper presents and justifies the theoretical concept, outlines and illustrates its use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Chengmin Zhou, Bingding Huang & Pasi Fränti

Notes

The prefix “meta” marks a task about a task, DM about DM, an option about an option, etc. Note that all abbreviations are summarised in Table 2 at the paper end.
The agent’s prior knowledge \(k^{0}\) implicitly conditions all pds involved. The knowledge \(k^{t}\) is also called information state. \((o_{t},a_{t})_{t\in \varvec{\{}{t} \varvec{\}}}\) is often referred as (closed DM loop) trajectory or the observed behaviour.
KLD, formerly called cross-entropy, Kullback and Leibler [67], now relative entropy, is the DM-rules-dependent expectation of the loss \(\ln (\mathsf {j}/\mathsf {j}_{\mathfrak {i}})\).
The usual MDP deals with the reward \(-\mathsf {L}\) and maximises its expectation.
This reflects its interpretation as a meta-action at the upper-level feedback, cf. Fig. 1 and Sect. 5.
The same choice is faced when dealing with usual exploration techniques, Ouyang et al. [53].
The term trust has narrower meaning than numerous studies focused on it, Li and Song [47].
This form of Bayes’ rule is valid for the considered DM rules for which the parameter pointing to the “best” model, [4], is unknown, cf. natural conditions of control in Peterka [56].
Extensive references on the whole approach can be found in the cited paper. The chapter, Dietrich and List [10], is a good starting point to pooling problems that are in the core of such a cooperation.
In this context, Shannon’s sampling theorem, Shannon [66], provides no guide.
The dependence of pds on the horizon h is made explicit here.
For a pd \(\mathsf {s}\) on \(\varvec{\{}{x} \varvec{\}}\), its support \(\mathrm {supp}[\mathsf {s}]\equiv \{x\in \varvec{\{}{x} \varvec{\}}:\,\mathsf {s}(x)>0\}\).
The proof tailors and refines results in Algoet and Cover [1, 4].

References

Algoet P, Cover T (1988) A sandwich proof of the Shannon-McMillan-Breiman theorem. Ann Probab 16:899–909
MathSciNet MATH Google Scholar
Åström K, Wittenmark B (1994) Adaptive control, 2nd edn. Addison-Wesley, New York
MATH Google Scholar
Beckenbach L, Osinenko P, Streif S (2020) A Q-learning predictive control scheme with guaranteed stability. Eur J Control 56:167–178
Article MathSciNet MATH Google Scholar
Berec L, Kárný M (1997) Identification of reality in Bayesian context. In: Kárný M, Warwick K (eds) Computer-intensive methods in control and signal processing. Birkhäuser, Basel, pp 181–193
Chapter MATH Google Scholar
Berger J (1985) Statistical decision theory and Bayesian analysis. Springer, Berlin
Book MATH Google Scholar
Bernardo J (1979) Expected information as expected utility. Ann Stat 7:686–690
Article MathSciNet MATH Google Scholar
Bertsekas D (2017) Dynamic programming and optimal control. Athena Scientific, Nashua
MATH Google Scholar
Bogdan P, Pedram M (2018) Toward enabling automated cognition and decision-making in complex cyber-physical systems. In: 2018 IEEE ISCAS, pp 1–4
Diebold F, Shin M (2019) Machine learning for regularized survey forecast combination: Partially-egalitarian LASSO and its derivatives. Int J Forecast 35:1679–1691
Article Google Scholar
Dietrich F, List C (2016) Probabilistic opinion pooling. In: Hitchcock C, Hajek A (eds) Oxford handbook of philosophy and probability. Oxford University Press, Oxford
Google Scholar
Doob J (1953) Stochastic processes. Wiley, Hoboken
MATH Google Scholar
Doyle J (2013) Survey of time preference, delay discounting models. Judge Decis Mak 8:116–135
Google Scholar
Duvenaud D (2014) Automatic model construction with Gaussian processes. PhD thesis, Pembroke College, University of Cambridge
Feldbaum A (1961) Theory of dual control. Autom Remote Control 22:3–19
MathSciNet Google Scholar
Gaitsgory V, Grüne L, Höger M, Kellett C, Weller S (2018) Stabilization of strictly dissipative discrete time systems with discounted optimal control. Automatica 93:311–320. https://doi.org/10.1016/j.automatica.2018.03.076
Article MathSciNet MATH Google Scholar
Ghavamzadeh M, Mannor S, Pineau J, Tamar A (2015) Bayesian reinforcement learning: a survey. Found Trends Mach Learn 8(5–6):359–483. https://doi.org/10.1561/2200000049
Article MATH Google Scholar
Grünwald P, Langford J (2007) Suboptimal behavior of Bayes and MDL in classification under misspecification. Mach Learn 66(2–3):119–149
Article MATH Google Scholar
Guan P, Raginsky M, Willett R (2014) Online Markov decision processes with Kullback Leibler control cost. IEEE Trans AC 59(6):1423–1438
Article MathSciNet MATH Google Scholar
Guy TV, Kárný M (2000) Design of an adaptive controller of LQG type: spline-based approach. Kybernetika 36(2):255–262
MathSciNet MATH Google Scholar
Hebb D (2005) The organization of behavior: a neuropsychological theory. Taylor & Francis. https://books.google.cz/books?id=uyV5AgAAQBAJ. Accessed 15 Dec 2019
Hospedales T, Antoniou A, Micaelli P, Storkey A (2020) Meta-learning in neural networks: A survey arXiv:2004.05439v1 [cs.LG]. Accessed 11 Apr 2020
Ishii S, Yoshida W, Yoshimoto J (2002) Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw 15(4–6):665–687
Article Google Scholar
Jacobs O, Patchell J (1972) Caution and probing in stochastic control. Int J Control 16(1):189–199
Article MATH Google Scholar
Jazwinski A (1970) Stochastic processes and filtering theory. Ac. Press, Pleasantville
MATH Google Scholar
Kandasamy K, Schneider J, Póczo B (2015) High dimensional Bayesian optimisation and bandits via additive models. In: International conference on machine learning, proceedings mlr.press, vol 37
Kárný M (1991) Estimation of control period for selftuners. Automatica 27(2):339–348 ((, extended version of the paper presented at 11th IFAC World Congr. , Tallinn))
Article MathSciNet MATH Google Scholar
Kárný M (1996) Towards fully probabilistic control design. Automatica 32(12):1719–1722
Article MathSciNet MATH Google Scholar
Kárný M (2020) Axiomatisation of fully probabilistic design revisited. Syst Con Lett. https://doi.org/10.1016/j.sysconle.2020.104719
Article MathSciNet MATH Google Scholar
Kárný M (2020) Minimum expected relative entropy principle. In: Proceedings of the 18th ECC, IFAC, Sankt Petersburg, pp 35–40
Kárný M, Alizadeh Z (2019) Towards fully probabilistic cooperative decision making. In: Slavkovik M (ed) Multi-agent systems, EUMAS 2018, vol LNAI 11450. Springer Nature, Dordrecht, pp 1–16
Google Scholar
Kárný M, Guy T (2012) On support of imperfect Bayesian participants. In: Guy T et al (eds) Decision making with imperfect decision makers, vol 28. Springer, Int, Syst. Ref. Lib., Berlin, pp 29–56
Chapter Google Scholar
Kárný M, Guy T (2019) Preference elicitation within framework of fully probabilistic design of decision strategies. In: IFAC International Workshop on Adaptive and Learning Control Systems, vol 52. pp 239–244
Kárný M, Hůla F (2019) Balancing exploitation and exploration via fully probabilistic design of decision policies. In: Proceedings of the 11th International Conference on Agents and Artificial Intelligence: ICAART, vol 2. pp 857–864
Kárný M, Kroupa T (2012) Axiomatisation of fully probabilistic design. Inf Sci 186(1):105–113
Article MathSciNet MATH Google Scholar
Kárný M, Halousková A, Böhm J, Kulhavý R, Nedoma P (1985) Design of linear quadratic adaptive control: theory and algorithms for practice. Kybernetika 21(supp. Nos 3–6):1–96
Kárný M, Böhm J, Guy T, Jirsa L, Nagy I, Nedoma P, Tesař L (2006) Optimized Bayesian dynamic advising: theory and algorithms. Springer, London
Google Scholar
Kárný M, Bodini A, Guy T, Kracík J, Nedoma P, Ruggeri F (2014) Fully probabilistic knowledge expression and incorporation. Stat Interface 7(4):503–515
Article MathSciNet MATH Google Scholar
Klenske E, Hennig P (2016) Dual control for approximate Bayesian reinforcement learning. J Mach Learn Res 17:1–30
MathSciNet MATH Google Scholar
Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1):171–203. https://doi.org/10.1007/s10994-010-5223-6
Article MathSciNet MATH Google Scholar
Kracík J, Kárný M (2005) Merging of data knowledge in Bayesian estimation. In: Filipe J et al (eds) Proceedings of the 2nd International Conference on informatics in control, automation and robotics, Barcelona, pp 229–232
Kulhavý R, Zarrop MB (1993) On a general concept of forgetting. Int J Control 58(4):905–924
Article MathSciNet MATH Google Scholar
Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22:79–87
Article MathSciNet MATH Google Scholar
Kumar EV, Jerome J, Srikanth K (2014) Algebraic approach for selecting the weighting matrices of linear quadratic regulator. In: 2014 International Conference on green computing communication and electrical engineering (ICGCCEE), pp 1–6. https://doi.org/10.1109/ICGCCEE.2014.6922382
Kumar P (1985) A survey on some results in stochastic adaptive control. SIAM J Control Appl 23:399–409
MathSciNet Google Scholar
Larsson D, Braun D, Tsiotrasz P (2017) Hierarchical state abstractions for decision-making problems with computational constraints. arXiv:1710.07990v1 [cs.AI], Accessed 22 Oct 2017
Lee K, Kim G, Ortega P, Lee D, Kim K (2019) Bayesian optimistic Kullback-Leibler exploration. Mach Learn 108(5):765–783. https://doi.org/10.1007/s10994-018-5767-4
Article MathSciNet MATH Google Scholar
Li W, Song H (2016) ART: an attack-resistant trust management scheme for securing vehicular ad hoc networks. IEEE Trans Intell Transport Syst 17:960–969
Article Google Scholar
Liao Y, Deschamps F, Loures E, Ramos L (2017) Past, present and future of industry 4.0—a systematic literature review and research agenda proposal. Int J Prod Res 55(12):3609–3629
Article Google Scholar
Mayne D (2014) Model predictive control: recent developments and future promise. Automatica 50:2967–2986
Article MathSciNet MATH Google Scholar
Meditch J (1969) Stochastic optimal linear estimation and control. McGraw Hill, New York
MATH Google Scholar
Mesbah A (2018) Stochastic model predictive control with active uncertainty learning: a survey on dual control. Ann Rev Control 45:107–117. https://doi.org/10.1016/j.arcontrol.2017.11.001. http://www.sciencedirect.com/science/article/pii/S1367578817301232
Moerland TM, Broekens J, Jonker CM (2018) Emotion in reinforcement learning agents and robots: a survey. Mach Learn 107(2):443–480. https://doi.org/10.1007/s10994-017-5666-0
Article MathSciNet Google Scholar
Ouyang Y, Gagrani M, Nayyar A, Jain R (2017) Learning unknown Markov decision processes: a Thompson sampling approach. In: von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R (eds) Advances in neural information processing systems 30. Curran Associates, Inc., pp 1333–1342
Peterka V (1972) On steady-state minimum variance control strategy. Kybernetika 8:219–231
MathSciNet MATH Google Scholar
Peterka V (1975) A square-root filter for real-time multivariable regression. Kybernetika 11:53–67
MathSciNet MATH Google Scholar
Peterka V (1981) Bayesian system identification. In: Eykhoff P (ed) Trends and progress in system identification. Perg. Press, pp 239–304
Chapter Google Scholar
Peterka V (1991) Adaptation for LQG control design to engineering needs. In: Warwick K, Kárný M, Halousková A (eds) Lecture notes: adv. methods in adaptive control for industrial application; Joint UK-CS seminar, vol 158. Springer-Verlag, NY
Peterka V, Astrom K (1973) Control of multivariable systems with unknown but constant parameters. In: Prepr. of the 3rd IFAC Symp. on identification and process parameter estimation, IFAC, Hague, Delft, pp 534–544
Puterman M (2005) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken
MATH Google Scholar
Quinn A, Kárný M, Guy T (2016) Fully probabilistic design of hierarchical Bayesian models. Inf Sci 369:532–547
Article MathSciNet MATH Google Scholar
Rao M (1987) Measure theory and integration. Wiley, Hoboken
MATH Google Scholar
Rohrs C, Valavani L, Athans M, Stein G (1982) Robustness of adaptive control algorithms in the presence of unmodeled dynamics. In: IEEE Conference on Decision and Control, Orlando, FL, vol 1, pp 3–11
Sandholm T (1999) Distributed rational decision making. In: Weiss G (ed) Multiagent systems—a modern approach to distributed artificial intelligence. MIT Press, Cambridge, pp 201–258
Google Scholar
Savage L (1954) Foundations of statistics. Wiley, Hoboken
MATH Google Scholar
Schweighofer N, Doya K (2003) Meta-learning in reinforcement learning. Neural Netw 16(1):5–9. https://doi.org/10.1016/S0893-6080(02)00228-9
Article Google Scholar
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27(379–423):623–656
Article MathSciNet MATH Google Scholar
Shore J, Johnson R (1980) Axiomatic derivation of the principle of maximum entropy & the principle of minimum cross-entropy. IEEE Trans Inf Th 26(1):26–37
Article MathSciNet MATH Google Scholar
Si J, Barto A, Powell W, Wunsch D (eds) (2004) Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, Hoboken
Google Scholar
Tanner M (1993) Tools for statistical inference. Springer Verlag, New York
Book MATH Google Scholar
Tao G (2014) Multivariable adaptive control: a survey. Automatica 50(11):2737–2764
Article MathSciNet MATH Google Scholar
Ullrich M (1964) Optimum control of some stochastic systems. In: Prepr. of the VIII-th conf. ETAN, Beograd
Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Article Google Scholar
Wu H, Guo X, Liu X (2017) Adaptive exploration-exploitation trade off for opportunistic bandits. Preprint at arXiv:1709.04004
Yang Z, Wang C, Zhang Z, Li J (2019) Mini-batch algorithms with online step size. Knowledge-Based Systems 165:228–240
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Theory and Automation, The Czech Academy of Sciences, 182 00, Prague 8, Czech Republic
Miroslav Kárný

Authors

Miroslav Kárný
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miroslav Kárný.

Ethics declarations

Conflicts of interest

The author has no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.

Funding

The reported research has been supported by MŠMT ČR LTC18075 and EU-COST Action CA16228.

Availability of data and material

Not applicable

Code availability

The source code of the example is available upon a request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kárný, M. Towards on-line tuning of adaptive-agent’s multivariate meta-parameter. Int. J. Mach. Learn. & Cyber. 12, 2717–2731 (2021). https://doi.org/10.1007/s13042-021-01358-w

Download citation

Received: 27 July 2020
Accepted: 08 June 2021
Published: 20 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13042-021-01358-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards on-line tuning of adaptive-agent’s multivariate meta-parameter

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Funding

Availability of data and material

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards on-line tuning of adaptive-agent’s multivariate meta-parameter

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Funding

Availability of data and material

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation