The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains

Chung, Jen Jen; Miklić, Damjan; Sabattini, Lorenzo; Tumer, Kagan; Siegwart, Roland

doi:10.1007/s10458-020-09442-1

The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains

Published: 21 January 2020

Volume 34, article number 21, (2020)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Jen Jen Chung ORCID: orcid.org/0000-0001-7828-0741¹,
Damjan Miklić²,
Lorenzo Sabattini³,
Kagan Tumer⁴ &
…
Roland Siegwart¹

717 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The state-action space of an individual agent in a multiagent team fundamentally dictates how the individual interacts with the rest of the team. Thus, how an agent is defined in the context of its domain has a significant effect on team performance when learning to coordinate. In this work we explore the trade-offs associated with these design choices, for example, having fewer agents in the team that individually are able to process and act on a wider scope of information about the world versus a larger team of agents where each agent observes and acts in a more local region of the domain. We focus our study on a traffic management domain and highlight the trends in learning performance when applying different agent definitions. In addition, we analyze the impact of agent failure for different agent definitions and investigate the ability of the team to learn new coordination strategies when individual agents become unresponsive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Multi-agent Reinforcement Learning in a Homogeneous Open Population

Combination of Interaction Models for Multi-Agents Systems

Constrained Multiagent Reinforcement Learning for Large Agent Population

Notes

Here we use “agent definition” to avoid confusion with other uses of the term “factorization” in multiagent literature.
We use a logistic activation function at each layer and the final network output is scaled by the base traversal time of the longest edge in the graph.

References

Agogino, A., & Tumer, K. (2004). Efficient evaluation functions for multi-rover systems. Genetic and evolutionary computation conference (pp. 1–11). Seattle, WA: Springer.
Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.
Article MathSciNet MATH Google Scholar
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 8(2), 156–172.
Article Google Scholar
Castellini, J., Oliehoek, F. A., Savani, R., & Whiteson, S. (2019). The representational capacity of action-value networks for multi-agent reinforcement learning. In Proceedings of the 18th international conference on autonomous agents and multiagent systems (pp. 1862–1864).
Chung, J.J., Chow, S., & Tumer, K. (2018). When less is more: Reducing agent noise with probabilistically learning agents. In Proceedings of the 17th international conference on autonomous agents and multiagent systems, International foundation for autonomous agents and multiagent systems, Stockholm, Sweden. Extended abstract (pp. 1900–1902).
Chung, J.J., Miklić, D., Sabattini, L., Tumer, K., & Siegwart, R. (2019). The impact of agent definitions and interactions on multiagent learning for coordination. In Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp. 1752–1760
Chung, J. J., Rebhuhn, C., Yates, C., Hollinger, G. A., & Tumer, K. (2019). A multiagent framework for learning dynamic traffic management strategies. Autonomous Robots, 43(6), 1375–1391.
Article Google Scholar
Claes, D., Oliehoek, F., Baier, H., & Tuyls, K. (2017). Decentralised online planning for multi-robot warehouse commissioning. In Proceedings of the 16th conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (pp. 492–500).
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI, Madison, WI (pp. 746–752).
Colby, M., Yliniemi, L., Pezzini, P., Tucker, D., Bryden, K.M., & Tumer, K. (2016). Multiobjective neuroevolutionary control for a fuel cell turbine hybrid energy system. In Proceedings of the genetic and evolutionary computation conference (pp. 877–884). Denver, CO: ACM
Devlin, S., & Kudenko, D. (2011). Theoretical considerations of potential-based reward shaping for multi-agent systems. In The 10th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, Taipei, Taiwan (Vol. 1, pp. 225–232).
Digani, V., Sabattini, L., & Secchi, C. (2016). A probabilistic Eulerian traffic model for the coordination of multiple AGVs in automatic warehouses. IEEE Robotics and Automation Letters, 1(1), 26–32.
Article Google Scholar
Digani, V., Sabattini, L., Secchi, C., & Fantuzzi, C. (2015). Ensemble coordination approach in multi-AGV systems applied to industrial warehouses. IEEE Transactions on Automation Science and Engineering, 12(3), 922–934.
Article Google Scholar
Ficici, S. G., Melnik, O., & Pollack, J. B. (2005). A game-theoretic and dynamical-systems analysis of selection methods in coevolution. IEEE Transactions on Evolutionary Computation, 9(6), 580–602.
Article Google Scholar
Hulse, D., Tumer, K., Hoyle, C., & Tumer, I. (2018). Modeling multidisciplinary design with multiagent learning. In: Artificial intelligence for engineering design, analysis and manufacturing (pp. 1–15). FirstView
Kitano, H. (1998). RoboCup-97: Robot soccer world cup I. New York: Springer.
Book Google Scholar
Littman, M.L., Dean, T.L., & Kaelbling, L.P. (1995). On the complexity of solving Markov decision problems. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (pp. 394–402).
Mannion, P., Duggan, J., & Howley, E. (2016). An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic road transport support systems (pp. 47–66). New York: Springer.
Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-agent Systems, 11(3), 387–434.
Article Google Scholar
Parunak, H. V. D. (1996). Applications of distributed artificial intelligence in industry. Foundations of Distributed Artificial Intelligence, 2, 1–18.
Google Scholar
Rosas, F., Chen, K. C., & Gündüz, D. (2018). Social learning for resilient data fusion against data falsification attacks. Computational Social Networks, 5(1), 10.
Article Google Scholar
Sen, S., & Weiss, G. (1999). Learning in multiagent systems. In G. Weiss (Ed.), Multiagent systems: A modern approach to distributed artificial intelligence (pp. 259–298). Cambridge, MA: MIT Press.
Google Scholar
Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In AAAI conference on artificial intelligence, Atlanta, GA (pp. 1504–1509).
Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.
Article Google Scholar
Sycara, K. P. (1998). Multiagent systems. AI Magazine, 19(2), 79.
Google Scholar
Tsitsiklis, J. N. (1993). Decentralized detection. Advances in Statistical Signal Processing, 2(2), 297–344.
Google Scholar
Veeravalli, V. V., & Varshney, P. K. (2012). Distributed inference in wireless sensor networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 370(1958), 100–117.
Article MathSciNet MATH Google Scholar
Weiß, G. (1996). Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In: G. Weiß & S. Sen (Eds.), IJCAI’95 workshop on adaption and learning in multi-agent systems (pp. 1–21). Berlin: Springer.
Wolpert, D. H., Wheeler, K. R., & Tumer, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third annual conference on autonomous agents (pp. 77–83). ACM
Ye, D., Zhang, M., & Yang, Y. (2015). A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5), 10026–10047.
Article Google Scholar

Download references

Acknowledgements

This paper is an extension of our AAMAS paper [6]. We provide additional experiments analyzing the differences in team performance after agent failure and also expand our discussion with notes for practitioners. This work was partially supported by the EU H2020 project CROWDBOT under Grant Nr. 779942 and by the National Science Foundation under Grant No. IIS-1815886.

Author information

Authors and Affiliations

Eidgenössische Technische Hochschule Zürich, Zurich, Switzerland
Jen Jen Chung & Roland Siegwart
RoMb Technologies d.o.o., Zagreb, Croatia
Damjan Miklić
University of Modena and Reggio Emilia, Reggio Emilia, Italy
Lorenzo Sabattini
Oregon State University, Corvallis, USA
Kagan Tumer

Authors

Jen Jen Chung
View author publications
You can also search for this author in PubMed Google Scholar
Damjan Miklić
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Sabattini
View author publications
You can also search for this author in PubMed Google Scholar
Kagan Tumer
View author publications
You can also search for this author in PubMed Google Scholar
Roland Siegwart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jen Jen Chung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Additional experiments on the centralized agent network architecture

We conducted additional experiments to assess the performance of the centralized learner when a more complex network architecture was available. In these experiments, the centralized agent uses the same relative number of hidden neurons as the intersection agents, i.e. four times the number of edges, which for 38 edges gives 152 hidden neurons. Thus, the centralized\(_{\text {rel}}\) agent has 11,590 weights defining its control policy while the centralized\(_{\text{t,rel}}\) agent with time information has 17,366 weights. This is over nine times more parameters than the originally tested centralized and centralized\(_{\text {t}}\) agent policies (see Table 1). Results are shown in Figs. 10, 11 and 12 below where the original centralized policies are in green and cyan (no time information and with time information, respectively), while the performance of the larger neural network policies are shown in dark grey and light grey, respectively.

From these results, we see that, in practice, additional policy parameters do not enable substantially better learning performance. The centralized learner with time information is able to improve its learning performance when using the more complex network architecture; however, the centralized learner without time information actually degrades in its performance when more parameters are introduced. Of course, a more comprehensive sweep of possible network architectures would be needed to assess whether there exists a “sweet spot” that balances network representational capacity and learning complexity for each of these cases. Nevertheless, the basic centralized learner with no time information and only 1254 network weights still outperforms all other policy architectures under all computed metrics. A quantitative comparison of the mean and medians of the final distributions is given in Table 4.

Table 4 Percentage improvement of the basic centralized agent against all other centralized policies

Full size table

Finally, we note that in these experiments, we only run our learning algorithm for 500 learning epochs. However, the learning trends suggest that to achieve substantial improvements in the policy performance (e.g. beyond what is currently achieved by the intersection, intersection\(_{\text {t}}\) and link\(_{\text {t}}\) agents) will require a significantly greater learning effort. The bottleneck may lie in the underlying evolutionary algorithm that is used to train the policies. Thus, future work may consider adapting or replacing this routine with techniques that are tailored to efficiently training neural networks with large numbers of parameters.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, J.J., Miklić, D., Sabattini, L. et al. The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains. Auton Agent Multi-Agent Syst 34, 21 (2020). https://doi.org/10.1007/s10458-020-09442-1

Download citation

Published: 21 January 2020
DOI: https://doi.org/10.1007/s10458-020-09442-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains

Abstract

Access this article

Similar content being viewed by others

Deep Multi-agent Reinforcement Learning in a Homogeneous Open Population

Combination of Interaction Models for Multi-Agents Systems

Constrained Multiagent Reinforcement Learning for Large Agent Population

Notes

References

Acknowledgements