Estimating Presentation Competence using Multimodal Nonverbal Behavioral Cues,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating Presentation Competence using Multimodal Nonverbal Behavioral Cues
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02636
Ömer Sümer, Cigdem Beyan, Fabian Ruth, Olaf Kramer, Ulrich Trautwein, Enkelejda Kasneci

Public speaking and presentation competence plays an essential role in many areas of social interaction in our educational, professional, and everyday life. Since our intention during a speech can differ from what is actually understood by the audience, the ability to appropriately convey our message requires a complex set of skills. Presentation competence is cultivated in the early school years and continuously developed over time. One approach that can promote efficient development of presentation competence is the automated analysis of human behavior during a speech based on visual and audio features and machine learning. Furthermore, this analysis can be used to suggest improvements and the development of skills related to presentation competence. In this work, we investigate the contribution of different nonverbal behavioral cues, namely, facial, body pose-based, and audio-related features, to estimate presentation competence. The analyses were performed on videos of 251 students while the automated assessment is based on manual ratings according to the T\"ubingen Instrument for Presentation Competence (TIP). Our classification results reached the best performance with early fusion in the same dataset evaluation (accuracy of 71.25%) and late fusion of speech, face, and body pose features in the cross dataset evaluation (accuracy of 78.11%). Similarly, regression results performed the best with fusion strategies.

中文翻译：

使用多模态非语言行为提示估计演讲能力

公开演讲和演讲能力在我们的教育，专业和日常生活中的许多社会互动领域中都扮演着至关重要的角色。由于我们在演讲中的意图可能与听众的实际理解有所不同，因此正确传达我们的信息的能力需要一套复杂的技能。演讲能力是在早期的学校培养的，并且随着时间的推移不断发展。一种可以促进演示能力有效发展的方法是基于视觉和音频功能以及机器学习对语音中的人类行为进行自动分析。此外，该分析可用于建议与演讲能力相关的技能的改进和发展。在这项工作中，我们研究了不同的非语言行为提示的贡献，即面部，基于身体姿势的以及与音频相关的功能，以评估演示能力。分析是对251名学生的视频进行的，而自动评估是根据T \“ ubingen演示能力工具（TIP）进行的手动评分得出的。我们的分类结果在相同数据集评估中达到了早期融合的最佳性能（准确度）（71.25％）和语音，面部和身体姿势特征的后期融合在交叉数据集评估中（准确性为78.11％），类似地，回归结果在融合策略中表现最佳。ubingen演示能力工具（TIP）。我们的分类结果在相同数据集评估中的早期融合（准确性为71.25％）和在交叉数据集评估中的语音，面部和身体姿势特征的后期融合（准确性为78.11％）中达到了最佳性能。同样，在融合策略下，回归结果表现最佳。ubingen演示能力工具（TIP）。我们的分类结果在相同数据集评估中的早期融合（准确性为71.25％）和在交叉数据集评估中的语音，面部和身体姿势特征的后期融合（准确性为78.11％）中达到了最佳性能。同样，在融合策略下，回归结果表现最佳。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>