Skip to main content
Log in

The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The state-action space of an individual agent in a multiagent team fundamentally dictates how the individual interacts with the rest of the team. Thus, how an agent is defined in the context of its domain has a significant effect on team performance when learning to coordinate. In this work we explore the trade-offs associated with these design choices, for example, having fewer agents in the team that individually are able to process and act on a wider scope of information about the world versus a larger team of agents where each agent observes and acts in a more local region of the domain. We focus our study on a traffic management domain and highlight the trends in learning performance when applying different agent definitions. In addition, we analyze the impact of agent failure for different agent definitions and investigate the ability of the team to learn new coordination strategies when individual agents become unresponsive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Here we use “agent definition” to avoid confusion with other uses of the term “factorization” in multiagent literature.

  2. We use a logistic activation function at each layer and the final network output is scaled by the base traversal time of the longest edge in the graph.

References

  1. Agogino, A., & Tumer, K. (2004). Efficient evaluation functions for multi-rover systems. Genetic and evolutionary computation conference (pp. 1–11). Seattle, WA: Springer.

  2. Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.

    Article  MathSciNet  MATH  Google Scholar 

  3. Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 8(2), 156–172.

    Article  Google Scholar 

  4. Castellini, J., Oliehoek, F. A., Savani, R., & Whiteson, S. (2019). The representational capacity of action-value networks for multi-agent reinforcement learning. In Proceedings of the 18th international conference on autonomous agents and multiagent systems (pp. 1862–1864).

  5. Chung, J.J., Chow, S., & Tumer, K. (2018). When less is more: Reducing agent noise with probabilistically learning agents. In Proceedings of the 17th international conference on autonomous agents and multiagent systems, International foundation for autonomous agents and multiagent systems, Stockholm, Sweden. Extended abstract (pp. 1900–1902).

  6. Chung, J.J., Miklić, D., Sabattini, L., Tumer, K., & Siegwart, R. (2019). The impact of agent definitions and interactions on multiagent learning for coordination. In Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp. 1752–1760

  7. Chung, J. J., Rebhuhn, C., Yates, C., Hollinger, G. A., & Tumer, K. (2019). A multiagent framework for learning dynamic traffic management strategies. Autonomous Robots, 43(6), 1375–1391.

    Article  Google Scholar 

  8. Claes, D., Oliehoek, F., Baier, H., & Tuyls, K. (2017). Decentralised online planning for multi-robot warehouse commissioning. In Proceedings of the 16th conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (pp. 492–500).

  9. Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI, Madison, WI (pp. 746–752).

  10. Colby, M., Yliniemi, L., Pezzini, P., Tucker, D., Bryden, K.M., & Tumer, K. (2016). Multiobjective neuroevolutionary control for a fuel cell turbine hybrid energy system. In Proceedings of the genetic and evolutionary computation conference (pp. 877–884). Denver, CO: ACM

  11. Devlin, S., & Kudenko, D. (2011). Theoretical considerations of potential-based reward shaping for multi-agent systems. In The 10th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, Taipei, Taiwan (Vol. 1, pp. 225–232).

  12. Digani, V., Sabattini, L., & Secchi, C. (2016). A probabilistic Eulerian traffic model for the coordination of multiple AGVs in automatic warehouses. IEEE Robotics and Automation Letters, 1(1), 26–32.

    Article  Google Scholar 

  13. Digani, V., Sabattini, L., Secchi, C., & Fantuzzi, C. (2015). Ensemble coordination approach in multi-AGV systems applied to industrial warehouses. IEEE Transactions on Automation Science and Engineering, 12(3), 922–934.

    Article  Google Scholar 

  14. Ficici, S. G., Melnik, O., & Pollack, J. B. (2005). A game-theoretic and dynamical-systems analysis of selection methods in coevolution. IEEE Transactions on Evolutionary Computation, 9(6), 580–602.

    Article  Google Scholar 

  15. Hulse, D., Tumer, K., Hoyle, C., & Tumer, I. (2018). Modeling multidisciplinary design with multiagent learning. In: Artificial intelligence for engineering design, analysis and manufacturing (pp. 1–15). FirstView

  16. Kitano, H. (1998). RoboCup-97: Robot soccer world cup I. New York: Springer.

    Book  Google Scholar 

  17. Littman, M.L., Dean, T.L., & Kaelbling, L.P. (1995). On the complexity of solving Markov decision problems. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (pp. 394–402).

  18. Mannion, P., Duggan, J., & Howley, E. (2016). An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic road transport support systems (pp. 47–66). New York: Springer.

  19. Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-agent Systems, 11(3), 387–434.

    Article  Google Scholar 

  20. Parunak, H. V. D. (1996). Applications of distributed artificial intelligence in industry. Foundations of Distributed Artificial Intelligence, 2, 1–18.

    Google Scholar 

  21. Rosas, F., Chen, K. C., & Gündüz, D. (2018). Social learning for resilient data fusion against data falsification attacks. Computational Social Networks, 5(1), 10.

    Article  Google Scholar 

  22. Sen, S., & Weiss, G. (1999). Learning in multiagent systems. In G. Weiss (Ed.), Multiagent systems: A modern approach to distributed artificial intelligence (pp. 259–298). Cambridge, MA: MIT Press.

    Google Scholar 

  23. Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In AAAI conference on artificial intelligence, Atlanta, GA (pp. 1504–1509).

  24. Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.

    Article  Google Scholar 

  25. Sycara, K. P. (1998). Multiagent systems. AI Magazine, 19(2), 79.

    Google Scholar 

  26. Tsitsiklis, J. N. (1993). Decentralized detection. Advances in Statistical Signal Processing, 2(2), 297–344.

    Google Scholar 

  27. Veeravalli, V. V., & Varshney, P. K. (2012). Distributed inference in wireless sensor networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 370(1958), 100–117.

    Article  MathSciNet  MATH  Google Scholar 

  28. Weiß, G. (1996). Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In: G. Weiß & S. Sen (Eds.), IJCAI’95 workshop on adaption and learning in multi-agent systems (pp. 1–21). Berlin: Springer.

  29. Wolpert, D. H., Wheeler, K. R., & Tumer, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third annual conference on autonomous agents (pp. 77–83). ACM

  30. Ye, D., Zhang, M., & Yang, Y. (2015). A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5), 10026–10047.

    Article  Google Scholar 

Download references

Acknowledgements

This paper is an extension of our AAMAS paper [6]. We provide additional experiments analyzing the differences in team performance after agent failure and also expand our discussion with notes for practitioners. This work was partially supported by the EU H2020 project CROWDBOT under Grant Nr. 779942 and by the National Science Foundation under Grant No. IIS-1815886.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jen Jen Chung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Additional experiments on the centralized agent network architecture

Appendix: Additional experiments on the centralized agent network architecture

We conducted additional experiments to assess the performance of the centralized learner when a more complex network architecture was available. In these experiments, the centralized agent uses the same relative number of hidden neurons as the intersection agents, i.e. four times the number of edges, which for 38 edges gives 152 hidden neurons. Thus, the centralized\(_{\text {rel}}\) agent has 11,590 weights defining its control policy while the centralized\(_{\text{t,rel}}\) agent with time information has 17,366 weights. This is over nine times more parameters than the originally tested centralized and centralized\(_{\text {t}}\) agent policies (see Table 1). Results are shown in Figs. 10, 11 and 12 below where the original centralized policies are in green and cyan (no time information and with time information, respectively), while the performance of the larger neural network policies are shown in dark grey and light grey, respectively.

Fig. 10
figure 10

Comparison of the average team performance across training epochs for different centralized agent policy architectures. Green and cyan plots are equivalent to those shown in Figs. 4, 5 and 6, while the grey curves show the performance of centralized agents whose neural network control policies have over nine times more parameters. Mean and one standard deviation (from 30 statistical runs) are shown for each set of experiments with increasing numbers of AGVs from (a)–(d). Best viewed in color (Color figure online)

Fig. 11
figure 11

Best team performance across training epochs. Best viewed in color (Color figure online)

From these results, we see that, in practice, additional policy parameters do not enable substantially better learning performance. The centralized learner with time information is able to improve its learning performance when using the more complex network architecture; however, the centralized learner without time information actually degrades in its performance when more parameters are introduced. Of course, a more comprehensive sweep of possible network architectures would be needed to assess whether there exists a “sweet spot” that balances network representational capacity and learning complexity for each of these cases. Nevertheless, the basic centralized learner with no time information and only 1254 network weights still outperforms all other policy architectures under all computed metrics. A quantitative comparison of the mean and medians of the final distributions is given in Table 4.

Table 4 Percentage improvement of the basic centralized agent against all other centralized policies

Finally, we note that in these experiments, we only run our learning algorithm for 500 learning epochs. However, the learning trends suggest that to achieve substantial improvements in the policy performance (e.g. beyond what is currently achieved by the intersection, intersection\(_{\text {t}}\) and link\(_{\text {t}}\) agents) will require a significantly greater learning effort. The bottleneck may lie in the underlying evolutionary algorithm that is used to train the policies. Thus, future work may consider adapting or replacing this routine with techniques that are tailored to efficiently training neural networks with large numbers of parameters.

Fig. 12
figure 12

Violin plots of the team performance distributions at the end of 500 training epochs. The ‘+’ symbol represents the mean and the ‘\(\times\)’ represents the median. The centralized\(_{\text {t,rel}}\) agent with 17,366 weights is able to improve on the performance of the centralized\(_{\text {t}}\) agent, which only has 1862 weights. However, the basic centralized agent with only 1254 neural network weights produces the highest mean and median performance under all four tested domain complexities

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chung, J.J., Miklić, D., Sabattini, L. et al. The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains. Auton Agent Multi-Agent Syst 34, 21 (2020). https://doi.org/10.1007/s10458-020-09442-1

Download citation

  • Published:

  • DOI: https://doi.org/10.1007/s10458-020-09442-1

Keywords

Navigation