Semi-Supervised NMF-CNN For Sound Event Detection,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semi-Supervised NMF-CNN For Sound Event Detection
arXiv - CS - Sound Pub Date : 2020-07-02 , DOI: arxiv-2007.00908
Chan Teck Kai, Chin Cheng Siong, and Li Ye

In this paper, a combinative approach using Nonnegative Matrix Factorization (NMF) and Convolutional Neural Network (CNN) is proposed for audio clip Sound Event Detection (SED). The main idea begins with the use of NMF to approximate strong labels for the weakly labeled data. Subsequently, using the approximated strongly labeled data, two different CNNs are trained in a semi-supervised framework where one CNN is used for clip-level prediction and the other for frame-level prediction. Based on this idea, our model can achieve an event-based F1-score of 45.7% on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4 validation dataset. By ensembling models through averaging the posterior outputs, event-based F1-score can be increased to 48.6%. By comparing with the baseline model, our proposed models outperform the baseline model by over 8%. By testing our models on the DCASE 2020 Challenge Task 4 test set, our models can achieve an event-based F1-score of 44.4% while our ensembled system can achieve an event-based F1-score of 46.3%. Such results have a minimum margin of 7% over the baseline system which demonstrates the robustness of our proposed method on different datasets.

中文翻译：

用于声音事件检测的半监督 NMF-CNN

在本文中，提出了一种使用非负矩阵分解 (NMF) 和卷积神经网络 (CNN) 的组合方法用于音频剪辑声音事件检测 (SED)。主要思想始于使用 NMF 来近似弱标记数据的强标记。随后，使用近似的强标记数据，在半监督框架中训练两个不同的 CNN，其中一个 CNN 用于剪辑级预测，另一个用于帧级预测。基于这个想法，我们的模型可以在声学场景和事件的检测和分类 (DCASE) 2020 挑战任务 4 验证数据集上实现 45.7% 的基于事件的 F1 分数。通过平均后验输出来集成模型，基于事件的 F1 分数可以增加到 48.6%。通过与基线模型的比较，我们提出的模型比基线模型高出 8% 以上。通过在 DCASE 2020 Challenge Task 4 测试集上测试我们的模型，我们的模型可以实现 44.4% 的基于事件的 F1 分数，而我们的集成系统可以实现 46.3% 的基于事件的 F1 分数。这样的结果比基线系统的最小裕度为 7%，这证明了我们提出的方法在不同数据集上的稳健性。

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文