Abstract
The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability.
Similar content being viewed by others
References
Ba J, Mnih V, Kavukcuoglu K (2014) Multiple object recognition with visual attention. arXiv:1412.7755
Barati E, Chen X, Zhong Z (2019) Attention-based deep reinforcement learning for multi-view environments. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1805–1807
Barreto A, Borsa D, Quan J, Schaul T, Silver D, Hessel M, Mankowitz D, Žídek A, Munos R (2019) Transfer in deep reinforcement learning using successor features and generalised policy improvement. arXiv:1901.10964
Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, van Hasselt HP, Silver D (2017). Successor features for transfer in reinforcement learning. In: Advances in neural information processing systems, pp 4055–4065
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Durugkar I, Gemp I, Mahadevan S (2016) Generative multi-adversarial networks. arXiv:1611.01673
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series
Hu Y, Sun S, Xu X, Zhao J (2019) Multi-view deep attention network for reinforcement learning. In: 34th AAAI Conference on Artificial Intelligence, pp 1–2
Iwata T, Yamada M (2016) Multi-view anomaly detection via robust probabilistic latent variable models. In: Advances in neural information processing systems, pp 1136–1144
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
Li M, Wu L, Ammar HB, Wang J (2019) Multi-view reinforcement learning. In: Advances in neural information processing systems
Li Y, Yang M, Zhang ZM (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31(10):1863–1883
Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Nie W, Narodytska N, Patel A (2018) Relgan: relational generative adversarial networks for text generation. In: International conference on learning representations
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Silver D, Newnham L, Barker D, Weller S, McFall J (2013) Concurrent reinforcement learning from customer interactions. In: International conference on machine learning, pp 924–932
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Sorokin I, Seleznev A, Pavlov M, Fedorov A, Ignateva A (2015) Deep attention recurrent q-network. arXiv:1512.01693
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Van Hasselt H, Guez A, Silver, D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence, pp 2613–2621
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575:350–354
Wang W, Arora R, Livescu K, Bilmes J (2015) On deep multi-view representation learning. In: International conference on machine learning, pp 1083–1092
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581
Watter M, Springenberg J, Boedecker J, Riedmiller M (2015) Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in neural information processing systems, pp 2746–2754
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Willianms R (1988) Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, Northeastern University
Zawadzki E, Lipson A, Leyton-Brown K (2014) Empirically evaluating multiagent learning algorithms. arXiv:1401.8074
Zhao B, Wu X, Cheng ZQ, Liu H, Jie Z, Feng J (2018) Multi-view image generation from a single-view. In: 2018 ACM multimedia conference on multimedia conference, pp 383–391
Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Projects 61673179, 61751311, and 61825305, Shanghai Knowledge Service Platform Project (No. ZF1213), and the Fundamental Research Funds for the Central Universities. Prof. Shiliang Sun is the corresponding author of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hu, Y., Sun, S., Xu, X. et al. Attentive multi-view reinforcement learning. Int. J. Mach. Learn. & Cyber. 11, 2461–2474 (2020). https://doi.org/10.1007/s13042-020-01130-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01130-6