Skip to main content
Log in

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

  • Regular Papers
  • Control Theory and Applications
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

The reinforcement learning problem of complex action control in multiplayer online battlefield games has brought considerable interest in the deep learning field. This problem involves more complex states and action spaces than traditional confrontation games, making it difficult to search for any strategy with human-level performance. This paper presents a deep reinforcement learning model to solve this problem from the perspective of game simulations and algorithm implementation. A reverse reinforcement-learning model based on high-level player training data is established to support downstream algorithms. With less training data, the proposed model is converged quicker, and more consistent with the action strategies of high-level players’ decision-making. Then an intelligent deduction algorithm based on DDQN is developed to achieve a better generalization ability under the guidance of a given reward function. At the game simulation level, this paper constructs Monte Carlo Tree Search Intelligent Decision Model for turn-based antagonistic deduction games to generate next-step actions. Furthermore, a prototype game simulator that combines offline with online functions is implemented to verify the performance of proposed model and algorithm. The experiments show that our proposed approach not only has a better reference value to the antagonistic environment using incomplete information, but also accurate and effective in predicting the return value. Moreover, our work provides a theoretical validation platform and testbed for related research on game AI for deductive games.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. M. A. DeLoura, Game Programming Gems 2, Cengage Learning, 2001.

  2. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van, D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” Nature vol. 529, pp. 484–489, 2016.

    Article  Google Scholar 

  3. V. N. Silva and L. Chaimowicz, “On the development of intelligent agents for moba games,” Proc. of 14th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 142–151, 2015.

  4. G. Synnaeve and P. Bessiere, “A Bayesian model for RTS units control applied to starcraft,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 190–196, 2011.

  5. Y. Tian, Q. Gong, W. Shang, Y. Wu, and C. L. Zitnick, “Elf: An extensive, lightweight and flexible research platform for real-time strategy games,” Advances in Neural Information Processing Systems, pp. 2656–2666, 2017.

  6. S. Wender and I. Watson, “Applying reinforcement learning to small scale combat in the real-time strategy game starcraft: Broodwar,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 402–408, 2012.

  7. OpenAI. 2018a. Openai blog: Dota 2. https://blog.openai.com/dota-2/ (17 Apr 2018).

  8. Y. Tian, Q. Gong, W. Shang, Y. Wu, and C. L. Zitnick, “Elf: An extensive, lightweight and flexible research platform for real-time strategy games,” Advances in Neural Information, 2017.

  9. O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, et al., “Starcraft II: A new challenge for reinforcement learning,” arXiv preprint arXiv:1708.04782, 2017.

  10. G. Synnaeve and P. Bessiere, “A Bayesian model for RTS units control applied to starcraft,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 190–196, 2011.

  11. S. Wender and I. Watson, “Applying reinforcement learning to small scale combat in the real-time strategy game starcraft: Broodwar,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 402–408, 2012.

  12. W. Deng, J. Xu, Y. Song, and H. Zhao, “An effective improved co-evolution ant colony optimization algorithm with multi-strategies and its application,” International Journal of Bio-inspired Computation, vol. 16, no. 3, pp. 158–170, 2020.

    Article  Google Scholar 

  13. Y. Su, S. Jin, X. Zhang, W. Shen, M. R. Eden, and J. Ren, “Stakeholder-oriented multi-objective process optimization based on an improved genetic algorithm,” Computers & Chemical Engineering, vol. 132, 106618, 2020.

    Article  Google Scholar 

  14. M. I. Jarrah, A. S. M. Jaya, Z. N. Alqattan, M. A. Azam, R. Abdullah, H. Jarrah, and A. I. Abu-Khadrah, “A novel explanatory hybrid artificial bee colony algorithm for numerical function optimization,” The Journal of Supercomputing, vol. 76, pp. 9330–9354, 2020.

    Article  Google Scholar 

  15. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

  16. J. Vincent, “Humans grab victory in first of three dota 2 matches against openai,” https://www.theverge.com/2018/8/23/17772376/openaidota-2-pain-game-human-victory-ai (Aug 23, 2018).

  17. T. Simonite, “Pro gamers fend off elon musk-backed ai bots-for now,” https://www.wired.com/story/pro-gamers-fend-off-elonmusks-ai-bots/ (Aug 23, 2018).

  18. V. N. Silva and L. Chaimowicz, “On the development of intelligent agents for moba games,” Proc. of 14th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 142–151, 2015.

  19. J. Hagelbäck and S. J. Johansson, “The rise of potential fields in real time strategy bots,” Proc. of 4th Artificial Intelligence and Interactive Digital Entertainment Conference, Stanford University, 2008.

  20. S. Ontanón and M. Buro, “Adversarial hierarchical-task network planning for complex real-time games,” Proc. of 24th International Joint Conference on Artificial Intelligence, 2015.

  21. B. W. Ballard, “The *-minimax search procedure for trees containing chance nodes,” Artificial Intelligence, vol. 21, no. 3, pp. 327–350, 1983.

    Article  Google Scholar 

  22. B. Bošanský, V. Lisý, M. Lanctot, J. Čermák, and M. H. M. Winands, “Algorithms for computing strategies in two-player simultaneous move games,” Artificial Intelligence, vol. 237, pp. 1–40, 2016.

    Article  MathSciNet  Google Scholar 

  23. K. Waugh, D. Morrill, J. A. Bagnell, and M. Bowling, “Solving games with functional regret estimation,” Proceedings of the AAAI Conference on Artificial Intelligence, 2015. https://arxiv.org/abs/1411.7974

  24. N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, 2019.

    Article  MathSciNet  Google Scholar 

  25. J. Heinrich, M. Lanctot, and D. Silver, “Fictitious self-play in extensive-form games,” Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 2015.

  26. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1928–1937, 2016.

  27. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015.

    Article  Google Scholar 

  28. E. M. Sanchez, J. B. Clempner, and A. S. Poznyak, “A priori-knowledeg/actor-critic reinforcement learning architecture for computing the mean-variance customer portfolio: The case of bank marketing campaigns,” Engineering Applications of Artificial Intelligence, vol. 46, pp. 82–92, 2015.

    Article  Google Scholar 

  29. S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent communication with backpropagation,” Advances in Neural Information Processing Systems, vol. 29, pp. 2244–2252, 2016.

    Google Scholar 

  30. M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.

    Article  Google Scholar 

  31. M. Bowling, “Convergence and no-regret in multiagent learning,” Advances in Neural Information Processing Systems 17 (NIPS), pp. 209–216, 2005.

  32. N. Brown, C. Kroer, and T. Sandholm, “Dynamic thresholding and pruning for regret minimization,” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017.

  33. M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, and T. Graepel, “A unified game-theoretic approach to multiagent reinforcement learning,” Advances in Neural Information Processing Systems, 2017.

  34. M. C. Machado, M. G. Bellemare, E. Talvitie, J. Veness, M. J. Hausknecht, and M. Bowling, “Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents,” Journal of Artificial Intelligence Research, vol. 61, pp. 523–562, 2018.

    Article  MathSciNet  Google Scholar 

  35. R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018.

  36. M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling, “Deepstack: Expert-level artificial intelligence in heads-up no-limit poker,” Science, vol. 356, no. 6337, pp. 508–513. October 2017.

    Article  MathSciNet  Google Scholar 

  37. A. Simonov, A. Zagarskikh, and V. Fedorov, “Applying behavior characteristics to decision-making process to create believable game AI,” Proc. of 8th International Young Scientist Conference on Computational Science, 2019.

  38. K.-H. Chen, “Dynamic randomization and domain knowledge in Monte-Carlo tree search for go knowledeg-based systems,” Knowledge-based Systems, vol. 34, pp. 21–25, 2012.

    Article  Google Scholar 

  39. S. Gelly and D. Silver, “Monte-Carlo tree search and rapid action value estimation in computer Go,” Artificial Intelligence, vol. 175, pp. 1856–1875, 2011.

    Article  MathSciNet  Google Scholar 

  40. R. Lorentz, “Using evaluation functions in Monte-Carlo tree search,” Theoretical Computer Scicence, vol. 644, pp. 106–113, 2016.

    Article  MathSciNet  Google Scholar 

  41. E. J. Powley, P. I. Cowling, and D. Whitehouse, “Information capture and reuse strategies in Monte Carlo tree search with application to games of hidden information,” Artificial Intelligence, vol. 217, pp. 92–116, 2014.

    Article  MathSciNet  Google Scholar 

  42. M. P. D. Schadd, M. H. M. Winands, M. J. W. Tak, J. W. H. M. Uiterwijk, “Single-player Monte-Carlo tree search for SameGame,” Knowledge-based Systems, vol. 34, pp. 3–11, 2012.

    Article  Google Scholar 

  43. B. W. Young and J. M. Green, “Achieving a decision paradigm for distributed warfare resource management,” Proc. of Conference Organized by Missouri University of Science and Technology, Philadelphia, PA, 2014.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuxiang Sun or Xianzhong Zhou.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Innovation and Creativity Research Program for Doctoral Students of Nanjing University (grant number CXCY19-19). This work was supported by the CSC scholarship. This work was also supported by National Nature Science Foundation under Grant 61876079.

Yuxiang Sun is a doctoral candidate of Nanjing University, Nanjing, China. He mainly focuses on system modeling and reinforcement learning.

Bo Yuan is a Lecturer in the School of Computing and Engineering with the University of Derby, UK. His research interests include artificial intelligence, machine learning, distributed and decentralized computing, and big data analytics.

Yongliang Zhang is an associate professor of Army Engineering University of PLA, Nanjing, China. He mainly focuses on command and control simulation and reinforcement learning.

Wanwen Zheng is a master candidate of Nanjing University, Nanjing, China. She mainly focuses on intelligent information processing and intelligent systems.

Qingfeng Xia is a doctoral candidate of Nanjing University, and an associate professor of Nanjing University of Information Science and Technology Binjiang College, Wuxi, China. He mainly focuses on intelligent control and multi-robot cooperation.

Bojian Tang is a master candidate of Nanjing University, Nanjing, China. He mainly focuses on system modeling and reinforcement learning.

Xianzhong Zhou is a Professor, Nanjing University, Nanjing, China. This paper mainly studies the cooperation and task planning of hybrid intelligent system.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Yuan, B., Zhang, Y. et al. Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game. Int. J. Control Autom. Syst. 19, 2984–2998 (2021). https://doi.org/10.1007/s12555-020-0277-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-020-0277-0

Keywords

Navigation