Predicting user visual attention in virtual reality with a deep learning model

Li, Xiangdong; Shan, Yifei; Chen, Wenqian; Wu, Yue; Hansen, Praben; Perrault, Simon

doi:10.1007/s10055-021-00512-7

Predicting user visual attention in virtual reality with a deep learning model

Original Article
Published: 05 April 2021

Volume 25, pages 1123–1136, (2021)
Cite this article

Virtual Reality Aims and scope Submit manuscript

Xiangdong Li ORCID: orcid.org/0000-0001-7778-6230¹,
Yifei Shan¹,
Wenqian Chen¹,
Yue Wu¹,
Praben Hansen² &
…
Simon Perrault³

1648 Accesses
7 Citations
Explore all metrics

A Correction to this article was published on 28 April 2021

This article has been updated

Abstract

Recent studies show that user’s visual attention during virtual reality museum navigation can be effectively estimated with deep learning models. However, these models rely on large-scale datasets that usually are of high structure complexity and context specific, which is challenging for nonspecialist researchers and designers. Therefore, we present the deep learning model, ALRF, to generalise on real-time user visual attention prediction in virtual reality context. The model combines two parallel deep learning streams to process the compact dataset of temporal–spatial salient features of user’s eye movements and virtual object coordinates. The prediction accuracy outperformed the state-of-the-art deep learning models by reaching record high 91.03%. Importantly, with quick parametric tuning, the model showed flexible applicability across different environments of the virtual reality museum and outdoor scenes. Implications for how the proposed model may be implemented as a generalising tool for adaptive virtual reality application design and evaluation are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum

Article 01 January 2022

User attention and behaviour in virtual reality art encounter

Article Open access 04 July 2022

GazeTransformer: Gaze Forecasting for Virtual Reality Using Transformer Networks

Availability of data and material

Derived data supporting the findings of this study are available from the corresponding author on request.

Code availability

Software applications used in the study are based on public open sources, and the code being used in this study is available from the corresponding author on request.

Change history

28 April 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10055-021-00527-0

References

Barbieri L, Bruno F, Muzzupappa M (2017) User-centered design of a virtual reality exhibit for archaeological museums. Int J Inter Des Manuf (IJIDeM) 12:561–571. https://doi.org/10.1007/s12008-017-0414-z
Article Google Scholar
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35:185–207. https://doi.org/10.1109/TPAMI.2012.89
Article Google Scholar
Chaabouni S, Benois-Pineau J, Amar CB (2016) Transfer learning with deep networks for saliency prediction in natural video. IEEE Int Conf Image Process. https://doi.org/10.1109/icip.2016.7532629
Article Google Scholar
Chen X, Kasgari ATZ, Saad W (2020) Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Netw Lett 2:81–84. https://doi.org/10.1109/lnet.2020.2977124
Article Google Scholar
Cummings JL, Teng B-S (2003) Transferring R&D knowledge: the key factors affecting knowledge transfer success. J Eng Tech Manag 20:39–68. https://doi.org/10.1016/s0923-4748(03)00004-3
Article Google Scholar
Cutting J (2017) Measuring game experience using visual distractors. Ext Abstr Publ Annu Sympos Comput-Hum Interact Play. https://doi.org/10.1145/3130859.3133221
Article Google Scholar
Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7:197–387. https://doi.org/10.1561/9781601988157
Article MathSciNet MATH Google Scholar
Fan C-L, Lee J, Lo W-C, Huang C-Y, Chen K-T, Hsu C-H (2017) Fixation prediction for 360 video streaming in head-mounted virtual reality. Proc Workshop Netw Oper Syst Supp Digit Audio Video. https://doi.org/10.1145/3083165.3083180
Article Google Scholar
Fan C-L, Yen S-C, Huang C-Y, Hsu C-H (2019) Optimizing fixation prediction using recurrent neural networks for 360° video streaming in head-mounted virtual reality. IEEE Trans Multimed 22:744–759. https://doi.org/10.1109/tmm.2019.2931807
Article Google Scholar
Fang Y, Wang Z, Lin W, Fang Z (2014) Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans Image Process 23:3910–3921. https://doi.org/10.1109/icme.2013.6607572
Article MathSciNet MATH Google Scholar
Fang Y, Zhang C, Li J, Lei J, Da Silva MP, Le Callet P (2017) Visual attention modeling for stereoscopic video: a benchmark and computational model. IEEE Trans Image Process 26:4684–4696. https://doi.org/10.1109/tip.2017.2721112
Article MathSciNet Google Scholar
Frutos-Pascual M, Garcia-Zapirain B (2015) Assessing visual attention using eye tracking sensors in intelligent cognitive therapies based on serious games. Sensors 15:11092–11117. https://doi.org/10.3390/s150511092
Article Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT Press, Cambridge. https://doi.org/10.4258/hir.2016.22.4.351
Book MATH Google Scholar
Green CS, Bavelier D (2003) Action video game modifies visual selective attention. Nature 423:534. https://doi.org/10.1038/nature01647
Article Google Scholar
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48. https://doi.org/10.1016/j.neucom.2015.09.116
Article Google Scholar
Haber J, Myszkowski K, Yamauchi H, Seidel HP (2001) Perceptually guided corrective splatting. Computer Graphics Forum. Wiley Online Library, Amsterdam, pp 142–153. https://doi.org/10.1111/1467-8659.00507
Chapter Google Scholar
Han H, Lu A, Wells U (2017) Under the movement of head: evaluating visual attention in immersive virtual reality environment. Int Conf Virtual Real Vis. https://doi.org/10.1109/icvrv.2017.00067
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2016.90
Article Google Scholar
Hell S, Argyriou V (2018) Machine learning architectures to predict motion sickness using a Virtual Reality rollercoaster simulation tool. IEEE Int Conf Artif Intell Virtual Real. https://doi.org/10.1109/AIVR.2018.00032
Article Google Scholar
Hillaire S, Lécuyer A, Breton G, Corte TR (2009) Gaze behavior and visual attention model when turning in virtual environments. Proc ACM Symp Virtual Real Softw Technol. https://doi.org/10.1145/1643928.1643941
Article Google Scholar
Huang H, Lin N-C, Barrett L, Springer D, Wang H-C, Pomplun M, Yu L-F (2016) Analyzing visual attention via virtual environments. SIGGRAPH ASIA Virtual Real Meets Phys Real. https://doi.org/10.1145/2992138.2992152
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Int Conf Mach Learn. https://doi.org/10.5555/3045118.3045167
Article Google Scholar
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2:194–203. https://doi.org/10.1038/35058500
Article Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/34.730558
Article Google Scholar
John B, Raiturkar P, Banerjee A, Jain E (2018) An evaluation of pupillary light response models for 2D screens and VR HMDs. Proc ACM Symp Virtual Real Softw Technol. https://doi.org/10.1145/3281505.3281538
Article Google Scholar
Karim F, Majumdar S, Darabi H, Chen S (2017) LSTM fully convolutional networks for time series classification IEEE. Access 6:1662–1669. https://doi.org/10.1109/ACCESS.2017.2779939
Article Google Scholar
Karim F, Majumdar S, Darabi H, Harford S (2019) Multivariate LSTM-FCNs for time series classification. Neural Netw 116:237–245. https://doi.org/10.1016/j.neunet.2019.04.014
Article Google Scholar
Laprade C, Bowman B, Huang HH (2020) PicoDomain: a compact high-fidelity cybersecurity dataset. arXiv:2008.09192
Li L, Ren J, Wang X (2015) Fast cat-eye effect target recognition based on saliency extraction. Opt Commun 350:33–39. https://doi.org/10.1016/j.optcom.2015.03.065
Article Google Scholar
Li X, Zhou Y, Chen W, Hansen P, Geng W, Sun L (2019) Towards personalised virtual reality touring through cross-object user interfaces. DE GRUYTER Press, Berlin. https://doi.org/10.1515/9783110552485-008
Book Google Scholar
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Lin T, Guo T, Aberer K (2017) Hybrid neural networks for learning the trend in time series. Proc Twenty-Sixth Int Jt Conf Artif Intell. https://doi.org/10.24963/ijcai.2017/316
Article Google Scholar
Lo W, Fan C, Lee J, Huang C, Chen K, Hsu C (2017) Video viewing dataset in head-mounted virtual reality. ACM Sigmm Conf Multimed Syst. https://doi.org/10.1145/3083187.3083219
Article Google Scholar
Low T, Bubalo N, Gossen T, Kotzyba M, Brechmann A, Huckauf A, Nürnberger A (2017) Towards identifying user intentions in exploratory search using gaze and pupil tracking. Proc Conf Hum Inform Interact Retr. https://doi.org/10.1145/3020165.3022131
Article Google Scholar
Mahdi A, Qin J, Representation I (2019) An extensive evaluation of deep featuresof convolutional neural networks for saliency prediction of human visual attention. J Vis Commun 65:102662. https://doi.org/10.1016/j.jvcir.2019.102662
Article Google Scholar
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. Advances in neural information processing systems. Springer, Berlin, pp 2204–2212. https://doi.org/10.5555/2969033.2969073
Chapter Google Scholar
Moniri MM, Valcarcel FAE, Merkel D, Sonntag D (2016) Human gaze and focus-of-attention in dual reality human-robot collaboration. Int Conf Intell Environ. https://doi.org/10.1109/IE.2016.54
Article Google Scholar
Nielsen LT, Møller MB, Hartmeyer SD, Ljung T, Nilsson NC, Nordahl R, Serafin S (2016) Missing the point: an exploration of how to guide users’ attention during cinematic virtual reality. Proc ACM Conf Virtual Real Softw Technol. https://doi.org/10.1145/2993369.2993405
Article Google Scholar
Ouyang W et al (2014) Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. Pattern Recognit. https://doi.org/10.1016/j.patcog.2018.02.004
Article Google Scholar
Ozcinar C, Smolic A (2018) Visual attention in omnidirectional video for virtual reality applications. Tenth Int Conf Qual Multimed Exp (QoMEX). https://doi.org/10.1109/QoMEX.2018.8463418
Article Google Scholar
Schubert T, Finke K, Redel P, Kluckow S, Müller H, Strobach T (2015) Video game experience and its influence on visual attention parameters: an investigation using the framework of the Theory of Visual Attention (TVA). Acta Psychol 157:200–214. https://doi.org/10.1016/j.actpsy.2015.03.005
Article Google Scholar
Sitzmann V, Serrano A, Pavel A, Agrawala M, Gutierrez D, Masia B, Wetzstein G (2018) Saliency in VR: How do people explore virtual environments? IEEE Trans Vis Comput Graph 24:1633–1642. https://doi.org/10.1109/TVCG.2018.2793599
Article Google Scholar
Sun G, Wu Y, Liu S, Peng T-Q, Zhu JJ, Liang R (2014) Evoriver: visual analysis of topic coopetition on social media. IEEE Trans Vis Comput Graph 20:1753–1762. https://doi.org/10.1109/TVCG.2014.2346919
Article Google Scholar
Sun L, Zhou Y, Hansen P, Geng W, Li X (2018) Cross-objects user interfaces for video interaction in virtual reality museum context. Multimed Tools Appl 77:29013–29041. https://doi.org/10.1007/s11042-018-6091-5
Article Google Scholar
Upenik E, Ebrahimi T (2017) A simple method to obtain visual attention data in head mounted virtual reality. IEEE Int Conf Multimed Expo Worksh. https://doi.org/10.1109/ICMEW.2017.8026231
Article Google Scholar
Walter R, Bulling A, Lindlbauer D, Schuessler M, Müller J (2015) Analyzing visual attention during whole body interaction with public displays. Proce ACM Int Jt Conf Pervasive Ubiquitous Comput. https://doi.org/10.1145/2750858.2804255
Article Google Scholar
Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27:2368–2378. https://doi.org/10.1109/TIP.2017.2787612
Article MathSciNet Google Scholar
Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based LSTM for aspect-level sentiment classification. Proc Conf Empir Methods Natl Lang Process. https://doi.org/10.18653/v1/D16-1058
Article Google Scholar
Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting video saliency prediction in the deep learning era. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2924417
Article Google Scholar
Wood G, Hartley G, Furley P, Wilson M (2016) Working memory capacity, visual attention and hazard perception in driving. J Appl Res Mem Cognit 5:454–462. https://doi.org/10.1016/j.jarmac.2016.04.009
Article Google Scholar
Xu Y, Dong Y, Wu J, Sun Z, Shi Z, Yu J, Gao S (2018) Gaze prediction in dynamic 360 immersive videos. Proc IEEE Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00559
Article Google Scholar
Yan Y et al (2018) Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn 79:65–78. https://doi.org/10.1016/j.patcog.2018.02.004
Article Google Scholar
Yang F-Y, Chang C-Y, Chien W-R, Chien Y-T, Tseng Y-H (2013) Tracking learners’ visual attention during a multimedia presentation in a real classroom. Comput Educ 62:208–220. https://doi.org/10.1016/j.compedu.2012.10.009
Article Google Scholar
Yang Q, Banovic N, Zimmerman J (2018) Mapping machine learning advances from HCI research to reveal starting places for design innovation. Proc Conf Hum Fact Comput Syst. https://doi.org/10.1145/3173574.3173704
Article Google Scholar
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Adv Neural Inform Process Syst. https://doi.org/10.5555/2969033.2969197
Article Google Scholar
Yu M, Lakshman H, Girod B (2015) A framework to evaluate omnidirectional video coding schemes. Int Symp Mixed Augment Real. https://doi.org/10.1109/ISMAR.2015.12
Article Google Scholar
Zhao Y, Forte M, Kopper R (2018) VR touch museum. In: 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE. https://doi.org/10.1109/VR.2018.8446581
Article Google Scholar
Zhou Y, Feng T, Shuai S, Li X, Sun L, Duh HBL (2019) An eye-tracking dataset for visual attention modelling in a virtual museum context. The 17th international conference on virtual-reality continuum and its applications in industry. Association for Computing Machinery, Brisbane. https://doi.org/10.1145/3359997.3365738
Chapter Google Scholar
Zhu Y, Zhai G, Min X (2018) The prediction of head and eye movement for 360 degree images. Signal Process 69:15–25. https://doi.org/10.1016/j.image.2018.05.010
Article Google Scholar

Download references

Acknowledgements

The work is supported by Natural Science Foundation of China (61802341) and ZJU-SUTD IDEA programme (IDEA006).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Xiangdong Li, Yifei Shan, Wenqian Chen & Yue Wu
Department of Computer Science and Systems, Stockholm University, Stockholm, Sweden
Praben Hansen
ISTD, Singapore University of Technology and Design, Singapore, Singapore
Simon Perrault

Authors

Xiangdong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Shan
View author publications
You can also search for this author in PubMed Google Scholar
Wenqian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Praben Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Simon Perrault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangdong Li.

Ethics declarations

Conflict of interest

The authors report no conflicts of interest.

Ethical approval

This study was approved by the university human research ethics committee, and all procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The Original article has been corrected: The fifth author name was corrected as Praben Hansen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Shan, Y., Chen, W. et al. Predicting user visual attention in virtual reality with a deep learning model. Virtual Reality 25, 1123–1136 (2021). https://doi.org/10.1007/s10055-021-00512-7

Download citation

Received: 28 February 2020
Accepted: 02 March 2021
Published: 05 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10055-021-00512-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting user visual attention in virtual reality with a deep learning model

Abstract

Access this article

Similar content being viewed by others

EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum

User attention and behaviour in virtual reality art encounter

GazeTransformer: Gaze Forecasting for Virtual Reality Using Transformer Networks

Availability of data and material

Code availability

Change history

28 April 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting user visual attention in virtual reality with a deep learning model

Abstract

Access this article

Similar content being viewed by others

EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum

User attention and behaviour in virtual reality art encounter

GazeTransformer: Gaze Forecasting for Virtual Reality Using Transformer Networks

Availability of data and material

Code availability

Change history

28 April 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation