当前位置: X-MOL 学术IEEE Trans. Cogn. Dev. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
IEEE Transactions on Cognitive and Developmental Systems ( IF 5.0 ) Pub Date : 2020-06-01 , DOI: 10.1109/tcds.2019.2927941
Kalin Stefanov , Jonas Beskow , Giampiero Salvi

This paper presents a self-supervised method for visual detection of the active speaker in a multiperson spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multiperson face-to-face interaction data set. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.

中文翻译:

基于自我监督视觉的主动说话者检测作为社会意识语言习得的支持

本文提出了一种在多人口语交互场景中视觉检测主动说话者的自监督方法。主动说话人检测是任何试图在社交环境中获取语言的人工认知系统的基本先决条件。所提出的方法旨在补充有源说话者的声学检测,从而提高系统在嘈杂条件下的鲁棒性。该方法可以完全基于有关他们面部的视觉信息来检测任意数量的可能重叠的活跃说话者。此外,该方法不依赖于外部注释,从而符合认知发展。相反,该方法使用来自听觉模式的信息来支持视觉领域的学习。本文报告了使用大型多人面对面交互数据集对所提出方法的广泛评估。结果显示在扬声器相关设置中的良好性能。然而,在独立于说话者的设置中,所提出的方法产生明显较低的性能。我们认为,所提出的方法代表了任何参与社交互动的人工认知系统或机器人平台的重要组成部分。
更新日期:2020-06-01
down
wechat
bug