Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Sun, Yuxiang; Yuan, Bo; Zhang, Yongliang; Zheng, Wanwen; Xia, Qingfeng; Tang, Bojian; Zhou, Xianzhong

doi:10.1007/s12555-020-0277-0

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Regular Papers
Control Theory and Applications
Published: 16 June 2021

Volume 19, pages 2984–2998, (2021)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Yuxiang Sun ORCID: orcid.org/0000-0002-4897-2007¹,
Bo Yuan²,
Yongliang Zhang⁴,
Wanwen Zheng³,
Qingfeng Xia³,
Bojian Tang³ &
…
Xianzhong Zhou³

198 Accesses
3 Citations
Explore all metrics

Abstract

The reinforcement learning problem of complex action control in multiplayer online battlefield games has brought considerable interest in the deep learning field. This problem involves more complex states and action spaces than traditional confrontation games, making it difficult to search for any strategy with human-level performance. This paper presents a deep reinforcement learning model to solve this problem from the perspective of game simulations and algorithm implementation. A reverse reinforcement-learning model based on high-level player training data is established to support downstream algorithms. With less training data, the proposed model is converged quicker, and more consistent with the action strategies of high-level players’ decision-making. Then an intelligent deduction algorithm based on DDQN is developed to achieve a better generalization ability under the guidance of a given reward function. At the game simulation level, this paper constructs Monte Carlo Tree Search Intelligent Decision Model for turn-based antagonistic deduction games to generate next-step actions. Furthermore, a prototype game simulator that combines offline with online functions is implemented to verify the performance of proposed model and algorithm. The experiments show that our proposed approach not only has a better reference value to the antagonistic environment using incomplete information, but also accurate and effective in predicting the return value. Moreover, our work provides a theoretical validation platform and testbed for related research on game AI for deductive games.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What’s in a Game? The Effect of Game Complexity on Deep Reinforcement Learning

Towards a Deep Reinforcement Learning Approach for Tower Line Wars

Deep Reinforcement Learning in Strategic Board Game Environments

References

M. A. DeLoura, Game Programming Gems 2, Cengage Learning, 2001.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van, D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” Nature vol. 529, pp. 484–489, 2016.
Article Google Scholar
V. N. Silva and L. Chaimowicz, “On the development of intelligent agents for moba games,” Proc. of 14th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 142–151, 2015.
G. Synnaeve and P. Bessiere, “A Bayesian model for RTS units control applied to starcraft,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 190–196, 2011.
Y. Tian, Q. Gong, W. Shang, Y. Wu, and C. L. Zitnick, “Elf: An extensive, lightweight and flexible research platform for real-time strategy games,” Advances in Neural Information Processing Systems, pp. 2656–2666, 2017.
S. Wender and I. Watson, “Applying reinforcement learning to small scale combat in the real-time strategy game starcraft: Broodwar,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 402–408, 2012.
OpenAI. 2018a. Openai blog: Dota 2. https://blog.openai.com/dota-2/ (17 Apr 2018).
Y. Tian, Q. Gong, W. Shang, Y. Wu, and C. L. Zitnick, “Elf: An extensive, lightweight and flexible research platform for real-time strategy games,” Advances in Neural Information, 2017.
O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, et al., “Starcraft II: A new challenge for reinforcement learning,” arXiv preprint arXiv:1708.04782, 2017.
G. Synnaeve and P. Bessiere, “A Bayesian model for RTS units control applied to starcraft,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 190–196, 2011.
S. Wender and I. Watson, “Applying reinforcement learning to small scale combat in the real-time strategy game starcraft: Broodwar,” Proc. of IEEE Conference on Computational Intelligence and Games (CIG), pp. 402–408, 2012.
W. Deng, J. Xu, Y. Song, and H. Zhao, “An effective improved co-evolution ant colony optimization algorithm with multi-strategies and its application,” International Journal of Bio-inspired Computation, vol. 16, no. 3, pp. 158–170, 2020.
Article Google Scholar
Y. Su, S. Jin, X. Zhang, W. Shen, M. R. Eden, and J. Ren, “Stakeholder-oriented multi-objective process optimization based on an improved genetic algorithm,” Computers & Chemical Engineering, vol. 132, 106618, 2020.
Article Google Scholar
M. I. Jarrah, A. S. M. Jaya, Z. N. Alqattan, M. A. Azam, R. Abdullah, H. Jarrah, and A. I. Abu-Khadrah, “A novel explanatory hybrid artificial bee colony algorithm for numerical function optimization,” The Journal of Supercomputing, vol. 76, pp. 9330–9354, 2020.
Article Google Scholar
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
J. Vincent, “Humans grab victory in first of three dota 2 matches against openai,” https://www.theverge.com/2018/8/23/17772376/openaidota-2-pain-game-human-victory-ai (Aug 23, 2018).
T. Simonite, “Pro gamers fend off elon musk-backed ai bots-for now,” https://www.wired.com/story/pro-gamers-fend-off-elonmusks-ai-bots/ (Aug 23, 2018).
V. N. Silva and L. Chaimowicz, “On the development of intelligent agents for moba games,” Proc. of 14th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 142–151, 2015.
J. Hagelbäck and S. J. Johansson, “The rise of potential fields in real time strategy bots,” Proc. of 4th Artificial Intelligence and Interactive Digital Entertainment Conference, Stanford University, 2008.
S. Ontanón and M. Buro, “Adversarial hierarchical-task network planning for complex real-time games,” Proc. of 24th International Joint Conference on Artificial Intelligence, 2015.
B. W. Ballard, “The *-minimax search procedure for trees containing chance nodes,” Artificial Intelligence, vol. 21, no. 3, pp. 327–350, 1983.
Article Google Scholar
B. Bošanský, V. Lisý, M. Lanctot, J. Čermák, and M. H. M. Winands, “Algorithms for computing strategies in two-player simultaneous move games,” Artificial Intelligence, vol. 237, pp. 1–40, 2016.
Article MathSciNet Google Scholar
K. Waugh, D. Morrill, J. A. Bagnell, and M. Bowling, “Solving games with functional regret estimation,” Proceedings of the AAAI Conference on Artificial Intelligence, 2015. https://arxiv.org/abs/1411.7974
N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, 2019.
Article MathSciNet Google Scholar
J. Heinrich, M. Lanctot, and D. Silver, “Fictitious self-play in extensive-form games,” Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 2015.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1928–1937, 2016.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015.
Article Google Scholar
E. M. Sanchez, J. B. Clempner, and A. S. Poznyak, “A priori-knowledeg/actor-critic reinforcement learning architecture for computing the mean-variance customer portfolio: The case of bank marketing campaigns,” Engineering Applications of Artificial Intelligence, vol. 46, pp. 82–92, 2015.
Article Google Scholar
S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent communication with backpropagation,” Advances in Neural Information Processing Systems, vol. 29, pp. 2244–2252, 2016.
Google Scholar
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.
Article Google Scholar
M. Bowling, “Convergence and no-regret in multiagent learning,” Advances in Neural Information Processing Systems 17 (NIPS), pp. 209–216, 2005.
N. Brown, C. Kroer, and T. Sandholm, “Dynamic thresholding and pruning for regret minimization,” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017.
M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, and T. Graepel, “A unified game-theoretic approach to multiagent reinforcement learning,” Advances in Neural Information Processing Systems, 2017.
M. C. Machado, M. G. Bellemare, E. Talvitie, J. Veness, M. J. Hausknecht, and M. Bowling, “Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents,” Journal of Artificial Intelligence Research, vol. 61, pp. 523–562, 2018.
Article MathSciNet Google Scholar
R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018.
M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling, “Deepstack: Expert-level artificial intelligence in heads-up no-limit poker,” Science, vol. 356, no. 6337, pp. 508–513. October 2017.
Article MathSciNet Google Scholar
A. Simonov, A. Zagarskikh, and V. Fedorov, “Applying behavior characteristics to decision-making process to create believable game AI,” Proc. of 8th International Young Scientist Conference on Computational Science, 2019.
K.-H. Chen, “Dynamic randomization and domain knowledge in Monte-Carlo tree search for go knowledeg-based systems,” Knowledge-based Systems, vol. 34, pp. 21–25, 2012.
Article Google Scholar
S. Gelly and D. Silver, “Monte-Carlo tree search and rapid action value estimation in computer Go,” Artificial Intelligence, vol. 175, pp. 1856–1875, 2011.
Article MathSciNet Google Scholar
R. Lorentz, “Using evaluation functions in Monte-Carlo tree search,” Theoretical Computer Scicence, vol. 644, pp. 106–113, 2016.
Article MathSciNet Google Scholar
E. J. Powley, P. I. Cowling, and D. Whitehouse, “Information capture and reuse strategies in Monte Carlo tree search with application to games of hidden information,” Artificial Intelligence, vol. 217, pp. 92–116, 2014.
Article MathSciNet Google Scholar
M. P. D. Schadd, M. H. M. Winands, M. J. W. Tak, J. W. H. M. Uiterwijk, “Single-player Monte-Carlo tree search for SameGame,” Knowledge-based Systems, vol. 34, pp. 3–11, 2012.
Article Google Scholar
B. W. Young and J. M. Green, “Achieving a decision paradigm for distributed warfare resource management,” Proc. of Conference Organized by Missouri University of Science and Technology, Philadelphia, PA, 2014.

Download references

Author information

Authors and Affiliations

College of Engineering Management, Nanjing University, No.22 Hankou Road, Gulou District, Nanjing City, Jiangsu Province, China
Yuxiang Sun
School of Computing and Engineering, Derby University, Derby, UK
Bo Yuan
School of Engineering Management, Nanjing University, No.22 Hankou Road, Gulou District, Nanjing City, Jiangsu Province, China
Wanwen Zheng, Qingfeng Xia, Bojian Tang & Xianzhong Zhou
Army Engineering University in Nanjing, Jiangsu Province, China
Yongliang Zhang

Authors

Yuxiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yongliang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wanwen Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingfeng Xia
View author publications
You can also search for this author in PubMed Google Scholar
Bojian Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuxiang Sun or Xianzhong Zhou.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Innovation and Creativity Research Program for Doctoral Students of Nanjing University (grant number CXCY19-19). This work was supported by the CSC scholarship. This work was also supported by National Nature Science Foundation under Grant 61876079.

Yuxiang Sun is a doctoral candidate of Nanjing University, Nanjing, China. He mainly focuses on system modeling and reinforcement learning.

Bo Yuan is a Lecturer in the School of Computing and Engineering with the University of Derby, UK. His research interests include artificial intelligence, machine learning, distributed and decentralized computing, and big data analytics.

Yongliang Zhang is an associate professor of Army Engineering University of PLA, Nanjing, China. He mainly focuses on command and control simulation and reinforcement learning.

Wanwen Zheng is a master candidate of Nanjing University, Nanjing, China. She mainly focuses on intelligent information processing and intelligent systems.

Qingfeng Xia is a doctoral candidate of Nanjing University, and an associate professor of Nanjing University of Information Science and Technology Binjiang College, Wuxi, China. He mainly focuses on intelligent control and multi-robot cooperation.

Bojian Tang is a master candidate of Nanjing University, Nanjing, China. He mainly focuses on system modeling and reinforcement learning.

Xianzhong Zhou is a Professor, Nanjing University, Nanjing, China. This paper mainly studies the cooperation and task planning of hybrid intelligent system.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Yuan, B., Zhang, Y. et al. Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game. Int. J. Control Autom. Syst. 19, 2984–2998 (2021). https://doi.org/10.1007/s12555-020-0277-0

Download citation

Received: 30 April 2020
Revised: 19 October 2020
Accepted: 28 November 2020
Published: 16 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s12555-020-0277-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Abstract

Access this article

Similar content being viewed by others

What’s in a Game? The Effect of Game Complexity on Deep Reinforcement Learning

Towards a Deep Reinforcement Learning Approach for Tower Line Wars

Deep Reinforcement Learning in Strategic Board Game Environments

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Abstract

Access this article

Similar content being viewed by others

What’s in a Game? The Effect of Game Complexity on Deep Reinforcement Learning

Towards a Deep Reinforcement Learning Approach for Tower Line Wars

Deep Reinforcement Learning in Strategic Board Game Environments

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation