Skip to main content

Advertisement

Log in

Attentive multi-view reinforcement learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://deepmind.com/

  2. https://github.com/mgbellemare/Arcade-Learning-Environment

References

  1. Ba J, Mnih V, Kavukcuoglu K (2014) Multiple object recognition with visual attention. arXiv:1412.7755

  2. Barati E, Chen X, Zhong Z (2019) Attention-based deep reinforcement learning for multi-view environments. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1805–1807

  3. Barreto A, Borsa D, Quan J, Schaul T, Silver D, Hessel M, Mankowitz D, Žídek A, Munos R (2019) Transfer in deep reinforcement learning using successor features and generalised policy improvement. arXiv:1901.10964

  4. Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, van Hasselt HP, Silver D (2017). Successor features for transfer in reinforcement learning. In: Advances in neural information processing systems, pp 4055–4065

  5. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915

  6. Durugkar I, Gemp I, Mahadevan S (2016) Generative multi-adversarial networks. arXiv:1611.01673

  7. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  8. Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series

  9. Hu Y, Sun S, Xu X, Zhao J (2019) Multi-view deep attention network for reinforcement learning. In: 34th AAAI Conference on Artificial Intelligence, pp 1–2

  10. Iwata T, Yamada M (2016) Multi-view anomaly detection via robust probabilistic latent variable models. In: Advances in neural information processing systems, pp 1136–1144

  11. Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264

  12. Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137

  13. Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683

  14. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Google Scholar 

  15. Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373

    MathSciNet  MATH  Google Scholar 

  16. Li M, Wu L, Ammar HB, Wang J (2019) Multi-view reinforcement learning. In: Advances in neural information processing systems

  17. Li Y, Yang M, Zhang ZM (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31(10):1863–1883

    Article  Google Scholar 

  18. Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390

  19. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602

  20. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  21. Nie W, Narodytska N, Patel A (2018) Relgan: relational generative adversarial networks for text generation. In: International conference on learning representations

  22. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952

  23. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Google Scholar 

  24. Silver D, Newnham L, Barker D, Weller S, McFall J (2013) Concurrent reinforcement learning from customer interactions. In: International conference on machine learning, pp 924–932

  25. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359

    Google Scholar 

  26. Sorokin I, Seleznev A, Pavlov M, Fedorov A, Ignateva A (2015) Deep attention recurrent q-network. arXiv:1512.01693

  27. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953

  28. Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038

    Article  Google Scholar 

  29. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  30. Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063

  31. Van Hasselt H, Guez A, Silver, D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence, pp 2613–2621

  32. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575:350–354

    Article  Google Scholar 

  33. Wang W, Arora R, Livescu K, Bilmes J (2015) On deep multi-view representation learning. In: International conference on machine learning, pp 1083–1092

  34. Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581

  35. Watter M, Springenberg J, Boedecker J, Riedmiller M (2015) Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in neural information processing systems, pp 2746–2754

  36. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256

    MATH  Google Scholar 

  37. Willianms R (1988) Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, Northeastern University

  38. Zawadzki E, Lipson A, Leyton-Brown K (2014) Empirically evaluating multiagent learning algorithms. arXiv:1401.8074

  39. Zhao B, Wu X, Cheng ZQ, Liu H, Jie Z, Feng J (2018) Multi-view image generation from a single-view. In: 2018 ACM multimedia conference on multimedia conference, pp 383–391

  40. Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Projects 61673179, 61751311, and 61825305, Shanghai Knowledge Service Platform Project (No. ZF1213), and the Fundamental Research Funds for the Central Universities. Prof. Shiliang Sun is the corresponding author of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiliang Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., Sun, S., Xu, X. et al. Attentive multi-view reinforcement learning. Int. J. Mach. Learn. & Cyber. 11, 2461–2474 (2020). https://doi.org/10.1007/s13042-020-01130-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01130-6

Keywords

Navigation