当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AudioPairBank: towards a large-scale tag-pair-based audio content analysis
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2018-09-15 , DOI: 10.1186/s13636-018-0137-5
Sebastian Säger , Benjamin Elizalde , Damian Borth , Christian Schulze , Bhiksha Raj , Ian Lane

Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and verb-noun pairs such as “flying insects,” which are underexplored. Therefore, this work investigates the relationship between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1123 pairs and over 33,000 audio files. In this paper, we include previously unavailable documentation of the challenges and implications of collecting audio recordings with these types of labels. We have also shown the degree of correlation between the audio content and the labels through classification experiments, which yielded 70% accuracy. The results and study in this paper encourage further exploration of the nuances in sounds and are meant to complement similar research performed on images and text in multimedia analysis.

中文翻译:

AudioPairBank:面向大规模基于标签对的音频内容分析

最近,声音识别已被用于识别声音,例如汽车或河流的声音。然而,声音具有细微差别,可以通过形容词-名词对(例如“慢车”)和动词-名词对(例如“飞虫”)来更好地描述,这些都没有得到充分探索。因此,这项工作调查了音频内容与形容词-名词对和动词-名词对之间的关​​系。由于缺乏具有此类注释的数据集,我们收集并处理了 AudioPairBank 语料库,该语料库由总共 1123 对和超过 33,000 个音频文件组成。在本文中,我们包含了以前无法获得的有关使用这些类型的标签收集录音的挑战和影响的文档。我们还通过分类实验展示了音频内容和标签之间的相关程度,准确率达到了 70%。本文的结果和研究鼓励进一步探索声音的细微差别,旨在补充多媒体分析中对图像和文本进行的类似研究。
更新日期:2018-09-15
down
wechat
bug