EEG-Video Emotion-Based Summarization: Learning With EEG Auxiliary Signals,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EEG-Video Emotion-Based Summarization: Learning With EEG Auxiliary Signals
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 9-21-2022 , DOI: 10.1109/taffc.2022.3208259
Wai-Cheong Lincoln Lew ₁ , Di Wang ₂ , Kai Keng Ang ₁ , Joo-Hwee Lim ₁ , Chai Quek ₁ , Ah-Hwee Tan ₃

Affiliation

Video summarization is the process of selecting a subset of informative keyframes to expedite storytelling with limited loss of information. In this article, we propose an EEG-Video Emotion-based Summarization (EVES) model based on a multimodal deep reinforcement learning (DRL) architecture that leverages neural signals to learn visual interestingness to produce quantitatively and qualitatively better video summaries. As such, EVES does not learn from the expensive human annotations but the multimodal signals. Furthermore, to ensure the temporal alignment and minimize the modality gap between the visual and EEG modalities, we introduce a Time Synchronization Module (TSM) that uses an attention mechanism to transform the EEG representations onto the visual representation space. We evaluate the performance of EVES on the TVSum and SumMe datasets. Based on the rank order statistics benchmarks, the experimental results show that EVES outperforms the unsupervised models and narrows the performance gap with supervised models. Furthermore, the human evaluation scores show that EVES receives a higher rating than the state-of-the-art DRL model DR-DSN by 11.4% on the coherency of the content and 7.4% on the emotion-evoking content. Thus, our work demonstrates the potential of EVES in selecting interesting content that is both coherent and emotion-evoking.

中文翻译：

脑电图视频基于情感的总结：使用脑电图辅助信号学习

视频摘要是选择信息丰富的关键帧子集以加快故事讲述速度且信息损失有限的过程。在本文中，我们提出了一种基于多模态深度强化学习（DRL）架构的脑电图视频基于情感的摘要（EVES）模型，该模型利用神经信号来学习视觉兴趣，以生成定量和定性更好的视频摘要。因此，EVES 不是从昂贵的人工注释中学习，而是从多模态信号中学习。此外，为了确保时间对齐并最小化视觉和脑电图模态之间的模态差距，我们引入了时间同步模块（TSM），它使用注意机制将脑电图表示转换到视觉表示空间。我们评估了 EVES 在 TVSum 和 SumMe 数据集上的性能。基于排序统计基准，实验结果表明，EVES 优于无监督模型，并缩小了与监督模型的性能差距。此外，人类评估分数显示，EVES 在内容连贯性方面比最先进的 DRL 模型 DR-DSN 获得更高的评分，高出 11.4%，在情感唤起内容方面高出 7.4%。因此，我们的工作展示了 EVES 在选择既连贯又唤起情感的有趣内容方面的潜力。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11