当前位置: X-MOL 学术J. Multimodal User Interfaces › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting multimodal presentation skills based on instance weighting domain adaptation
Journal on Multimodal User Interfaces ( IF 2.2 ) Pub Date : 2021-02-18 , DOI: 10.1007/s12193-021-00367-x
Yutaro Yagi , Shogo Okada , Shota Shiobara , Sota Sugimura

Presentation skills assessment is one of the central challenges of multimodal modeling. Presentation skills are composed of verbal and nonverbal skill components, but because people demonstrate their presentation skills in a variety of manners, the observed multimodal features vary widely. Due to the differences in features, when test data samples are generated on different training data sample distributions, in many cases, the prediction accuracy of the skills degrades. In machine learning theory, this problem in which training (source) data are biased is known as instance selection bias or covariate shift. To solve this problem, this paper presents an instance weighting adaptation method that is applied to estimate the presentation skills of each participant from multimodal (verbal and nonverbal) features. For this purpose, we collect a novel multimodal presentation dataset that includes audio signal data, body motion sensor data, and text data of the speech content for participants observed in 58 presentation sessions. The dataset also includes both verbal and nonverbal presentation skills, which are assessed by two external experts from a human resources department. We extract multimodal features, such as spoken utterances, acoustic features, and the amount of body motion, to estimate the presentation skills. We propose two approaches, early fusing and late fusing, for the regression models based on multimodal instance weighting adaptation. The experimental results show that the early fusing regression model with instance weighting adaptation achieved \(\rho =0.39\) for the Pearson correlation, which presents the regression accuracy for the clarity of presentation goal elements. In the maximum case, the accuracy (correlation coefficient) is improved from \(-0.34\) to +0.35 by instance weighting adaptation.



中文翻译:

基于实例加权域自适应预测多峰表示技能

演示技能评估是多模式建模的主要挑战之一。演讲技巧由语言和非语言技巧组成,但是由于人们以多种方式展示演讲技巧,因此观察到的多峰功能差异很大。由于特征的差异,当在不同的训练数据样本分布上生成测试数据样本时,在许多情况下,技能的预测准确性会下降。在机器学习理论中,训练(源)数据存在偏差的问题称为实例选择偏差或协变量偏移。为了解决这个问题,本文提出了一种实例加权自适应方法,该方法用于从多模态(语言和非语言)特征估计每个参与者的表达技巧。以此目的,我们收集了一个新颖的多模式演示数据集,其中包括针对58个演示会话中观察到的参与者的音频信号数据,人体运动传感器数据和语音内容的文本数据。该数据集还包括口头表达能力和非语言表达能力,均由人力资源部门的两名外部专家进行评估。我们提取多模式特征,例如语音,声学特征和身体运动量,以评估演示技巧。对于基于多模式实例加权自适应的回归模型,我们提出了两种方法:早期融合和晚期融合。实验结果表明,实现了实例加权自适应的早期融合回归模型。和在58个演示会话中观察到的参与者语音内容的文本数据。该数据集还包括口头表达能力和非语言表达能力,均由人力资源部门的两名外部专家进行评估。我们提取多模式特征,例如语音,声学特征和身体运动量,以评估演示技巧。对于基于多模式实例加权自适应的回归模型,我们提出了两种方法:早期融合和晚期融合。实验结果表明,实现了实例加权自适应的早期融合回归模型。和在58个演示会话中观察到的参与者语音内容的文本数据。该数据集还包括口头表达能力和非语言表达能力,均由人力资源部门的两名外部专家进行评估。我们提取多模式特征,例如语音,声学特征和身体运动量,以估计演示技巧。对于基于多模式实例加权自适应的回归模型,我们提出了两种方法:早期融合和晚期融合。实验结果表明,实现了实例加权自适应的早期融合回归模型。我们提取多模式特征,例如语音,声学特征和身体运动量,以评估演示技巧。对于基于多模式实例加权自适应的回归模型,我们提出了两种方法:早期融合和晚期融合。实验结果表明,实现了实例加权自适应的早期融合回归模型。我们提取多模式特征,例如语音,声学特征和身体运动量,以评估演示技巧。对于基于多模式实例加权自适应的回归模型,我们提出了两种方法:早期融合和晚期融合。实验结果表明,实现了实例加权自适应的早期融合回归模型。皮尔逊相关性为\(\ rho = 0.39 \),该表达式表示清晰的表示目标元素可提供回归精度。在最大情况下,通过实例加权自适应将精度(相关系数)从\(-0.34 \)提高到+0.35。

更新日期:2021-02-18
down
wechat
bug