当前位置: X-MOL 学术Front. Neurorobotics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech Driven Gaze in a Face-to-Face Interaction
Frontiers in Neurorobotics ( IF 2.6 ) Pub Date : 2021-01-25 , DOI: 10.3389/fnbot.2021.598895
Ülkü Arslan Aydin 1 , Sinan Kalkan 2 , Cengiz Acartürk 1, 3
Affiliation  

Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that conveys crucial social signals in face-to-face conversation. However, compared to language, gaze has been less studied as a communication modality. The purpose of the present study is twofold: (i) To investigate gaze direction (i.e., aversion and face gaze) and its relation to speech in a face-to-face interaction. (ii) To propose a computational model for multimodal communication, which predicts gaze direction using high-level speech features. Twenty-eight pairs of participants participated in data collection. The experimental setting was a mock job interview. The eye movements were recorded for both participants. The speech data were annotated by ISO 24617-2 Standard for Dialogue Act Annotation, as well as manual tags based on previous social gaze studies. A comparative analysis was conducted by Convolutional Neural Network (CNN) models that employed specific architectures, namely VGGNet and ResNet. The results showed that the frequency and the duration of gaze differ significantly depending on the role of participant. Moreover, the ResNet models achieve higher than 70% accuracy in predicting gaze direction.

中文翻译:

面对面互动中的语音驱动注视

目光和语言是多模式交流的主要支柱。凝视是一种非语言机制,可在面对面交谈中传达重要的社交信号。然而,与语言相比,凝视作为一种交流方式的研究较少。本研究的目的有两个:(i)研究凝视方向(即厌恶和面部凝视)及其与面对面互动中言语的关系。(ii) 提出一种多模态通信的计算模型,该模型使用高级语音特征来预测注视方向。二十八对参与者参与了数据收集。实验设置是模拟工作面试。记录了两名参与者的眼球运动。语音数据按照 ISO 24617-2 对话行为注释标准以及基于先前社交凝视研究的手动标签进行注释。通过采用特定架构的卷积神经网络(CNN)模型(即 VGGNet 和 ResNet)进行了比较分析。结果表明,根据参与者的角色,凝视的频率和持续时间存在显着差异。此外,ResNet 模型在预测注视方向方面的准确率高于 70%。
更新日期:2021-03-17
down
wechat
bug