Leveraging Audio Gestalt to Predict Media Memorability,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging Audio Gestalt to Predict Media Memorability
arXiv - CS - Multimedia Pub Date : 2020-12-31 , DOI: arxiv-2012.15635
Lorin Sweeney, Graham Healy, Alan F. Smeaton

Memorability determines what evanesces into emptiness, and what worms its way into the deepest furrows of our minds. It is the key to curating more meaningful media content as we wade through daily digital torrents. The Predicting Media Memorability task in MediaEval 2020 aims to address the question of media memorability by setting the task of automatically predicting video memorability. Our approach is a multimodal deep learning-based late fusion that combines visual, semantic, and auditory features. We used audio gestalt to estimate the influence of the audio modality on overall video memorability, and accordingly inform which combination of features would best predict a given video's memorability scores.

中文翻译：

利用音频格式塔预测媒体的记忆力

记忆力决定了什么是对空虚的回避，以及是什么蠕虫进入了我们最深的沟壑。当我们每天都在浏览数字洪流时，这是策划更有意义的媒体内容的关键。MediaEval 2020中的“预测媒体可存储性”任务旨在通过设置自动预测视频可存储性的任务来解决媒体可存储性的问题。我们的方法是基于多模式深度学习的后期融合，结合了视觉，语义和听觉功能。我们使用了音频格式塔来估计音频模态对整体视频记忆力的影响，并据此告知哪些功能组合最能预测给定视频的记忆力得分。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文