当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Polyphonic sound event detection based on convolutional recurrent neural networks with semi-supervised loss function for DCASE challenge 2020 task 4
arXiv - CS - Sound Pub Date : 2020-07-02 , DOI: arxiv-2007.00947
Nam Kyun Kim, Hong Kook Kim

This report proposes a polyphonic sound event detection (SED) method for the DCASE 2020 Challenge Task 4. The proposed SED method is based on semi-supervised learning to deal with the different combination of training datasets such as weakly labeled dataset, unlabeled dataset, and strongly labeled synthetic dataset. Especially, the target label of each audio clip from weakly labeled or unlabeled dataset is first predicted by using the mean teacher model that is the DCASE 2020 baseline. The data with predicted labels are used for training the proposed SED model, which consists of CNNs with skip connections and self-attention mechanism, followed by RNNs. In order to compensate for the erroneous prediction of weakly labeled and unlabeled data, a semi-supervised loss function is employed for the proposed SED model. In this work, several versions of the proposed SED model are implemented and evaluated on the validation set according to the different parameter setting for the semi-supervised loss function, and then an ensemble model that combines five-fold validation models is finally selected as our final model.

中文翻译:

基于卷积递归神经网络和半监督损失函数的和弦声音事件检测,用于 DCASE 挑战 2020 任务 4

本报告为 DCASE 2020 挑战任务 4 提出了一种和弦声音事件检测 (SED) 方法。所提出的 SED 方法基于半监督学习来处理训练数据集的不同组合,例如弱标记数据集、未标记数据集和强标记的合成数据集。特别是,首先使用作为 DCASE 2020 基线的平均教师模型预测来自弱标记或未标记数据集的每个音频剪辑的目标标签。带有预测标签的数据用于训练所提出的 SED 模型,该模型由具有跳过连接和自注意力机制的 CNN 组成,然后是 RNN。为了补偿弱标记和未标记数据的错误预测,所提出的 SED 模型采用了半监督损失函数。在这项工作中,
更新日期:2020-07-03
down
wechat
bug