Abstract
The state-action space of an individual agent in a multiagent team fundamentally dictates how the individual interacts with the rest of the team. Thus, how an agent is defined in the context of its domain has a significant effect on team performance when learning to coordinate. In this work we explore the trade-offs associated with these design choices, for example, having fewer agents in the team that individually are able to process and act on a wider scope of information about the world versus a larger team of agents where each agent observes and acts in a more local region of the domain. We focus our study on a traffic management domain and highlight the trends in learning performance when applying different agent definitions. In addition, we analyze the impact of agent failure for different agent definitions and investigate the ability of the team to learn new coordination strategies when individual agents become unresponsive.
Similar content being viewed by others
Notes
Here we use “agent definition” to avoid confusion with other uses of the term “factorization” in multiagent literature.
We use a logistic activation function at each layer and the final network output is scaled by the base traversal time of the longest edge in the graph.
References
Agogino, A., & Tumer, K. (2004). Efficient evaluation functions for multi-rover systems. Genetic and evolutionary computation conference (pp. 1–11). Seattle, WA: Springer.
Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 8(2), 156–172.
Castellini, J., Oliehoek, F. A., Savani, R., & Whiteson, S. (2019). The representational capacity of action-value networks for multi-agent reinforcement learning. In Proceedings of the 18th international conference on autonomous agents and multiagent systems (pp. 1862–1864).
Chung, J.J., Chow, S., & Tumer, K. (2018). When less is more: Reducing agent noise with probabilistically learning agents. In Proceedings of the 17th international conference on autonomous agents and multiagent systems, International foundation for autonomous agents and multiagent systems, Stockholm, Sweden. Extended abstract (pp. 1900–1902).
Chung, J.J., Miklić, D., Sabattini, L., Tumer, K., & Siegwart, R. (2019). The impact of agent definitions and interactions on multiagent learning for coordination. In Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp. 1752–1760
Chung, J. J., Rebhuhn, C., Yates, C., Hollinger, G. A., & Tumer, K. (2019). A multiagent framework for learning dynamic traffic management strategies. Autonomous Robots, 43(6), 1375–1391.
Claes, D., Oliehoek, F., Baier, H., & Tuyls, K. (2017). Decentralised online planning for multi-robot warehouse commissioning. In Proceedings of the 16th conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (pp. 492–500).
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI, Madison, WI (pp. 746–752).
Colby, M., Yliniemi, L., Pezzini, P., Tucker, D., Bryden, K.M., & Tumer, K. (2016). Multiobjective neuroevolutionary control for a fuel cell turbine hybrid energy system. In Proceedings of the genetic and evolutionary computation conference (pp. 877–884). Denver, CO: ACM
Devlin, S., & Kudenko, D. (2011). Theoretical considerations of potential-based reward shaping for multi-agent systems. In The 10th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, Taipei, Taiwan (Vol. 1, pp. 225–232).
Digani, V., Sabattini, L., & Secchi, C. (2016). A probabilistic Eulerian traffic model for the coordination of multiple AGVs in automatic warehouses. IEEE Robotics and Automation Letters, 1(1), 26–32.
Digani, V., Sabattini, L., Secchi, C., & Fantuzzi, C. (2015). Ensemble coordination approach in multi-AGV systems applied to industrial warehouses. IEEE Transactions on Automation Science and Engineering, 12(3), 922–934.
Ficici, S. G., Melnik, O., & Pollack, J. B. (2005). A game-theoretic and dynamical-systems analysis of selection methods in coevolution. IEEE Transactions on Evolutionary Computation, 9(6), 580–602.
Hulse, D., Tumer, K., Hoyle, C., & Tumer, I. (2018). Modeling multidisciplinary design with multiagent learning. In: Artificial intelligence for engineering design, analysis and manufacturing (pp. 1–15). FirstView
Kitano, H. (1998). RoboCup-97: Robot soccer world cup I. New York: Springer.
Littman, M.L., Dean, T.L., & Kaelbling, L.P. (1995). On the complexity of solving Markov decision problems. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (pp. 394–402).
Mannion, P., Duggan, J., & Howley, E. (2016). An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic road transport support systems (pp. 47–66). New York: Springer.
Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-agent Systems, 11(3), 387–434.
Parunak, H. V. D. (1996). Applications of distributed artificial intelligence in industry. Foundations of Distributed Artificial Intelligence, 2, 1–18.
Rosas, F., Chen, K. C., & Gündüz, D. (2018). Social learning for resilient data fusion against data falsification attacks. Computational Social Networks, 5(1), 10.
Sen, S., & Weiss, G. (1999). Learning in multiagent systems. In G. Weiss (Ed.), Multiagent systems: A modern approach to distributed artificial intelligence (pp. 259–298). Cambridge, MA: MIT Press.
Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In AAAI conference on artificial intelligence, Atlanta, GA (pp. 1504–1509).
Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.
Sycara, K. P. (1998). Multiagent systems. AI Magazine, 19(2), 79.
Tsitsiklis, J. N. (1993). Decentralized detection. Advances in Statistical Signal Processing, 2(2), 297–344.
Veeravalli, V. V., & Varshney, P. K. (2012). Distributed inference in wireless sensor networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 370(1958), 100–117.
Weiß, G. (1996). Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In: G. Weiß & S. Sen (Eds.), IJCAI’95 workshop on adaption and learning in multi-agent systems (pp. 1–21). Berlin: Springer.
Wolpert, D. H., Wheeler, K. R., & Tumer, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third annual conference on autonomous agents (pp. 77–83). ACM
Ye, D., Zhang, M., & Yang, Y. (2015). A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5), 10026–10047.
Acknowledgements
This paper is an extension of our AAMAS paper [6]. We provide additional experiments analyzing the differences in team performance after agent failure and also expand our discussion with notes for practitioners. This work was partially supported by the EU H2020 project CROWDBOT under Grant Nr. 779942 and by the National Science Foundation under Grant No. IIS-1815886.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Additional experiments on the centralized agent network architecture
Appendix: Additional experiments on the centralized agent network architecture
We conducted additional experiments to assess the performance of the centralized learner when a more complex network architecture was available. In these experiments, the centralized agent uses the same relative number of hidden neurons as the intersection agents, i.e. four times the number of edges, which for 38 edges gives 152 hidden neurons. Thus, the centralized\(_{\text {rel}}\) agent has 11,590 weights defining its control policy while the centralized\(_{\text{t,rel}}\) agent with time information has 17,366 weights. This is over nine times more parameters than the originally tested centralized and centralized\(_{\text {t}}\) agent policies (see Table 1). Results are shown in Figs. 10, 11 and 12 below where the original centralized policies are in green and cyan (no time information and with time information, respectively), while the performance of the larger neural network policies are shown in dark grey and light grey, respectively.
From these results, we see that, in practice, additional policy parameters do not enable substantially better learning performance. The centralized learner with time information is able to improve its learning performance when using the more complex network architecture; however, the centralized learner without time information actually degrades in its performance when more parameters are introduced. Of course, a more comprehensive sweep of possible network architectures would be needed to assess whether there exists a “sweet spot” that balances network representational capacity and learning complexity for each of these cases. Nevertheless, the basic centralized learner with no time information and only 1254 network weights still outperforms all other policy architectures under all computed metrics. A quantitative comparison of the mean and medians of the final distributions is given in Table 4.
Finally, we note that in these experiments, we only run our learning algorithm for 500 learning epochs. However, the learning trends suggest that to achieve substantial improvements in the policy performance (e.g. beyond what is currently achieved by the intersection, intersection\(_{\text {t}}\) and link\(_{\text {t}}\) agents) will require a significantly greater learning effort. The bottleneck may lie in the underlying evolutionary algorithm that is used to train the policies. Thus, future work may consider adapting or replacing this routine with techniques that are tailored to efficiently training neural networks with large numbers of parameters.
Rights and permissions
About this article
Cite this article
Chung, J.J., Miklić, D., Sabattini, L. et al. The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains. Auton Agent Multi-Agent Syst 34, 21 (2020). https://doi.org/10.1007/s10458-020-09442-1
Published:
DOI: https://doi.org/10.1007/s10458-020-09442-1