当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to infer human attention in daily activities
Pattern Recognition ( IF 8 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.patcog.2020.107314
Zhixiong Nan , Tianmin Shu , Ran Gong , Shu Wang , Ping Wei , Song-Chun Zhu , Nanning Zheng

Abstract The first attention model in the computer science community is proposed in 1998. In the following years, human attention has been intensively studied. However, these studies mainly refer human attention as the image regions that draw the attention of a human (outside the image) who is looking at the image. In this paper, we infer the attention of a human inside a third-person view video where the human is doing a task, and define human attention as attentional objects that coincide with the task the human is doing. To infer human attention, we propose a deep neural network model that fuses both low-level human pose cue and high-level task encoding cue. Due to the lack of appropriate public datasets for studying this problem, we newly collect a video dataset in complex Virtual-Reality (VR) scenes. In the experiments, we widely compare our method with three other methods on this VR dataset. In addition, we re-annotate a public real dataset and conduct the extensional experiments on this real dataset. The experiment results validate the effectiveness of our method.

中文翻译:

学习在日常活动中推断人类的注意力

摘要 计算机科学界第一个注意力模型于 1998 年提出。在随后的几年中,人类注意力得到了深入研究。然而,这些研究主要将人类注意力称为吸引正在观看图像的人类(图像之外)注意力的图像区域。在本文中,我们在人类正在执行任务的第三人称视角视频中推断人类的注意力,并将人类注意力定义为与人类正在执行的任务一致的注意力对象。为了推断人类的注意力,我们提出了一种深度神经网络模型,该模型融合了低级人体姿势线索和高级任务编码线索。由于缺乏合适的公共数据集来研究这个问题,我们在复杂的虚拟现实 (VR) 场景中新收集了一个视频数据集。在实验中,我们在这个 VR 数据集上广泛地将我们的方法与其他三种方法进行了比较。此外,我们重新注释了一个公开的真实数据集,并在这个真实数据集上进行了扩展实验。实验结果验证了我们方法的有效性。
更新日期:2020-07-01
down
wechat
bug