当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Audiovisual transfer learning for audio tagging and sound event detection
arXiv - CS - Sound Pub Date : 2021-06-09 , DOI: arxiv-2106.05408
Wim Boes, Hugo Van hamme

We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection. Employing feature fusion, we adapt a baseline system utilizing only spectral acoustic inputs to also make use of pretrained auditory and visual features, extracted from networks built for different tasks and trained with external data. We perform experiments with these modified models on an audiovisual multi-label data set, of which the training partition contains a large number of unlabeled samples and a smaller amount of clips with weak annotations, indicating the clip-level presence of 10 sound categories without specifying the temporal boundaries of the active auditory events. For clip-based audio tagging, this transfer learning method grants marked improvements. Addition of the visual modality on top of audio also proves to be advantageous in this context. When it comes to generating transcriptions of audio recordings, the benefit of pretrained features depends on the requested temporal resolution: for coarse-grained sound event detection, their utility remains notable. But when more fine-grained predictions are required, performance gains are strongly reduced due to a mismatch between the problem at hand and the goals of the models from which the pretrained vectors were obtained.

中文翻译:

用于音频标记和声音事件检测的视听迁移学习

我们研究了迁移学习对两个声音识别问题的优点,即音频标记和声音事件检测。采用特征融合,我们调整了一个仅使用频谱声学输入的基线系统,以利用预训练的听觉和视觉特征,这些特征是从为不同任务构建的网络中提取的,并使用外部数据进行训练。我们在视听多标签数据集上使用这些修改后的模型进行实验,其中训练分区包含大量未标记的样本和少量带有弱注释的剪辑,表明剪辑级别存在 10 个声音类别,但未指定活动听觉事件的时间边界。对于基于剪辑的音频标记,这种迁移学习方法带来了显着的改进。在这种情况下,在音频之上添加视觉模式也被证明是有利的。在生成录音转录时,预训练特征的好处取决于所需的时间分辨率:对于粗粒度的声音事件检测,它们的效用仍然值得注意。但是,当需要更细粒度的预测时,由于手头的问题与从中获得预训练向量的模型的目标不匹配,性能提升会大大降低。
更新日期:2021-06-11
down
wechat
bug