Capturing scattered discriminative information using a deep architecture in acoustic scene classification,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Capturing scattered discriminative information using a deep architecture in acoustic scene classification
arXiv - CS - Sound Pub Date : 2020-07-09 , DOI: arxiv-2007.04631
Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-jin Yu

Frequently misclassified pairs of classes that share many common acoustic properties exist in acoustic scene classification (ASC). To distinguish such pairs of classes, trivial details scattered throughout the data could be vital clues. However, these details are less noticeable and are easily removed using conventional non-linear activations (e.g. ReLU). Furthermore, making design choices to emphasize trivial details can easily lead to overfitting if the system is not sufficiently generalized. In this study, based on the analysis of the ASC task's characteristics, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem. We adopt a max feature map method to replace conventional non-linear activations in a deep neural network, and therefore, we apply an element-wise comparison between different filters of a convolution layer's output. Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power. Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the single best performing system has an accuracy of 70.4% compared to 65.1% of the baseline.

中文翻译：

在声学场景分类中使用深度架构捕获分散的判别信息

声学场景分类 (ASC) 中存在经常错误分类的具有许多共同声学特性的类对。为了区分这样的类对，分散在数据中的琐碎细节可能是重要的线索。然而，这些细节不太明显，可以使用传统的非线性激活（例如 ReLU）轻松去除。此外，如果系统没有充分泛化，那么在设计选择上强调琐碎的细节很容易导致过度拟合。在这项研究中，基于对 ASC 任务特征的分析，我们研究了各种方法来捕获判别信息并同时减轻过拟合问题。我们采用最大特征映射方法来代替深度神经网络中传统的非线性激活，因此，我们在卷积层输出的不同滤波器之间应用逐元素比较。进一步探索了两种数据增强方法和两种深度架构模块，以减少过度拟合并维持系统的判别能力。使用声学场景和事件 2020 task1-a 数据集的检测和分类进行了各种实验，以验证所提出的方法。我们的结果表明，所提出的系统始终优于基线，其中单个最佳性能系统的准确率为 70.4%，而基线为 65.1%。使用声学场景和事件 2020 task1-a 数据集的检测和分类进行了各种实验，以验证所提出的方法。我们的结果表明，所提出的系统始终优于基线，其中单个最佳性能系统的准确率为 70.4%，而基线为 65.1%。使用声学场景和事件 2020 task1-a 数据集的检测和分类进行了各种实验，以验证所提出的方法。我们的结果表明，所提出的系统始终优于基线，其中单个最佳性能系统的准确率为 70.4%，而基线为 65.1%。

更新日期：2020-07-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文