Multi-Task Learning For Acoustic Event Detection Using Event and Frame Position Information,IEEE Transactions on Multimedia

当前位置： X-MOL 学术 › IEEE Trans. Multimedia › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Task Learning For Acoustic Event Detection Using Event and Frame Position Information
IEEE Transactions on Multimedia ( IF 7.3 ) Pub Date : 2020-03-01 , DOI: 10.1109/tmm.2019.2933330
Xianjun Xia , Roberto Togneri , Ferdous Sohel , Yuanjun Zhao , Defeng Huang

Acoustic event detection deals with the acoustic signals to determine the sound type and to estimate the audio event boundaries. Multi-label classification based approaches are commonly used to detect the frame wise event types with a median filter applied to determine the happening acoustic events. However, the multi-label classifiers are trained only on the acoustic event types ignoring the frame position within the audio events. To deal with this, this paper proposes to construct a joint learning based multi-task system. The first task performs the acoustic event type detection and the second task is to predict the frame position information. By sharing representations between the two tasks, we can enable the acoustic models to generalize better than the original classifier by averaging respective noise patterns to be implicitly regularized. Experimental results on the monophonic UPC-TALP and the polyphonic TUT Sound Event datasets demonstrate the superior performance of the joint learning method by achieving lower error rate and higher F-score compared to the baseline AED system.

中文翻译：

使用事件和帧位置信息进行声学事件检测的多任务学习

声事件检测处理声信号以确定声音类型并估计音频事件边界。基于多标签分类的方法通常用于检测逐帧事件类型，并应用中值滤波器来确定发生的声学事件。然而，多标签分类器仅针对声事件类型进行训练，而忽略了音频事件中的帧位置。为了解决这个问题，本文提出构建一个基于联合学习的多任务系统。第一个任务执行声事件类型检测，第二个任务是预测帧位置信息。通过在两个任务之间共享表示，我们可以通过平均要隐式正则化的各个噪声模式，使声学模型比原始分类器具有更好的泛化能力。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>