Themes Informed Audio-visual Correspondence Learning,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Themes Informed Audio-visual Correspondence Learning
arXiv - CS - Multimedia Pub Date : 2020-09-14 , DOI: arxiv-2009.06573
Runze Su, Fei Tao, Xudong Liu, Haoran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie

The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks. Among them, learning the correspondence between audio and visual information from videos is a challenging one. Most previous work of the audio-visual correspondence(AVC) learning only investigated constrained videos or simple settings, which may not fit the application of UGV. In this paper, we proposed new principles for AVC and introduced a new framework to set sight of videos' themes to facilitate AVC learning. We also released the KWAI-AD-AudVis corpus which contained 85432 short advertisement videos (around 913 hours) made by users. We evaluated our proposed approach on this corpus, and it was able to outperform the baseline by 23.15% absolute difference.

中文翻译：

主题视听函授学习

短期用户生成视频（UGV）的应用，如 Snapchat 和 Youtube 短期视频，最近蓬勃发展，引发了大量的多模态机器学习任务。其中，从视频中学习音频和视觉信息之间的对应关系是一项具有挑战性的工作。以往大部分视听通信（AVC）学习工作只研究了受限视频或简单设置，可能不适合 UGV 的应用。在本文中，我们提出了 AVC 的新原则，并引入了一个新框架来设置视频主题以促进 AVC 学习。我们还发布了 KWAI-AD-AudVis 语料库，其中包含用户制作的 85432 个短广告视频（约 913 小时）。我们在这个语料库上评估了我们提出的方法，它能够以 23.15% 的绝对差异优于基线。

更新日期：2020-10-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文