当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Audio Retrieval with Natural Language Queries
arXiv - CS - Information Retrieval Pub Date : 2021-05-05 , DOI: arxiv-2105.02192
Andreea-Maria Oncescu, A. Sophia Koepke, João F. Henriques, Zeynep Akata, Samuel Albanie

We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce challenging new benchmarks for text-based audio retrieval using text annotations sourced from the Audiocaps and Clotho datasets. We then employ these benchmarks to establish baselines for cross-modal audio retrieval, where we demonstrate the benefits of pre-training on diverse audio tasks. We hope that our benchmarks will inspire further research into cross-modal text-based audio retrieval with free-form text queries.

中文翻译:

通过自然语言查询进行音频检索

我们考虑使用自由格式的自然语言查询来检索音频的任务。为了研究这个问题,该问题在现有文献中受到很少的关注,我们使用从Audiocaps和Clotho数据集中获取的文本注释,为基于文本的音频检索引入了具有挑战性的新基准。然后,我们使用这些基准来建立跨模式音频检索的基准,在此我们演示对各种音频任务进行预训练的好处。我们希望我们的基准测试能够激发人们对使用自由格式文本查询的跨模式基于文本的音频检索进行进一步的研究。
更新日期:2021-05-06
down
wechat
bug