Multimodal assessment of apparent personality using feature attention and error consistency constraint,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multimodal assessment of apparent personality using feature attention and error consistency constraint
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-03-24 , DOI: 10.1016/j.imavis.2021.104163
Süleyman Aslan , Uğur Güdükbay , Hamdi Dibeklioğlu

Personality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. To this end, we use four different modalities, namely, ambient appearance (scene), facial appearance, voice, and transcribed speech. Through a specialized subnetwork for each of these modalities, our model learns reliable modality-specific representations and fuse them using an attention mechanism that re-weights each dimension of these representations to obtain an optimal combination of multimodal information. A novel loss function is employed to enforce the proposed model to give an equivalent importance for each of the personality traits to be estimated through a consistency constraint that keeps the trait-specific errors as close as possible. To further enhance the reliability of our model, we employ (pre-trained) state-of-the-art architectures (i.e., ResNet, VGGish, ELMo) as the backbones of the modality-specific subnetworks, which are complemented by multilayered Long Short-Term Memory networks to capture temporal dynamics. To minimize the computational complexity of multimodal optimization, we use two-stage modeling, where the modality-specific subnetworks are first trained individually, and the whole network is then fine-tuned to jointly model multimodal data. On the large scale ChaLearn First Impressions V2 challenge dataset, we evaluate the reliability of our model as well as investigating the informativeness of the considered modalities. Experimental results show the effectiveness of the proposed attention mechanism and the error consistency constraint. While the best performance is obtained using facial information among individual modalities, with the use of all four modalities, our model achieves a mean accuracy of 91.8%, improving the state of the art in automatic personality analysis.

中文翻译：

使用特征注意和错误一致性约束对表观人格进行多模态评估

在人格特质的识别中至关重要的人格计算和情感计算，最近在许多研究领域中越来越引起人们的关注和关注。我们提出了一种新颖的方法来从视频中识别人们的五种人格特质。为此，我们使用四种不同的方式，即环境外观（场景），面部外观，语音和转录语音。通过针对每种模式的专用子网，我们的模型可学习可靠的特定于模式的表示形式，并使用注意力机制对它们进行融合，该注意机制将对这些表示形式的各个维度进行加权，以获得多模式信息的最佳组合。采用新颖的损失函数来强制提出的模型，以通过一致性约束为每个待评估的人格特质赋予同等的重要性，该约束将特质特有的误差保持在尽可能近的水平。为了进一步提高模型的可靠性，我们采用（预训练的）最新架构（即ResNet，VGGish，ELMo）作为特定于模式的子网的骨干，并由多层Long Short进行补充术语存储网络以捕获时间动态。为了最大程度地减少多模式优化的计算复杂性，我们使用两阶段建模，其中首先单独训练特定于模式的子网，然后对整个网络进行微调，以联合建模多模式数据。在大规模的ChaLearn First Impressions V2挑战数据集上，我们评估模型的可靠性，并调查所考虑模式的信息性。实验结果证明了所提注意机制和错误一致性约束的有效性。尽管在各个模态中使用面部信息可获得最佳性能，但同时使用所有四个模态，我们的模型的平均准确度达到91.8％，从而改善了自动人格分析的最新水平。

更新日期：2021-04-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11