Robust Deep Learning Frameworks for Acoustic Scene and Respiratory Sound Classification,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust Deep Learning Frameworks for Acoustic Scene and Respiratory Sound Classification
arXiv - CS - Sound Pub Date : 2021-07-20 , DOI: arxiv-2107.09268
Lam Pham

This thesis focuses on dealing with the task of acoustic scene classification (ASC), and then applied the techniques developed for ASC to a real-life application of detecting respiratory disease. To deal with ASC challenges, this thesis addresses three main factors that directly affect the performance of an ASC system. Firstly, this thesis explores input features by making use of multiple spectrograms (log-mel, Gamma, and CQT) for low-level feature extraction to tackle the issue of insufficiently discriminative or descriptive input features. Next, a novel Encoder network architecture is introduced. The Encoder firstly transforms each low-level spectrogram into high-level intermediate features, or embeddings, and thus combines these high-level features to form a very distinct composite feature. The composite or combined feature is then explored in terms of classification performance, with different Decoders such as Random Forest (RF), Multilayer Perception (MLP), and Mixture of Experts (MoE). By using this Encoder-Decoder framework, it helps to reduce the computation cost of the reference process in ASC systems which make use of multiple spectrogram inputs. Since the proposed techniques applied for general ASC tasks were shown to be highly effective, this inspired an application to a specific real-life application. This was namely the 2017 Internal Conference on Biomedical Health Informatics (ICBHI) respiratory sound dataset. Building upon the proposed ASC framework, the ICBHI tasks were tackled with a deep learning framework, and the resulting system shown to be capable at detecting respiratory anomaly cycles and diseases.

中文翻译：

用于声学场景和呼吸声音分类的稳健深度学习框架

本论文侧重于处理声场景分类 (ASC) 任务，然后将针对 ASC 开发的技术应用于检测呼吸系统疾病的实际应用中。为了应对 ASC 挑战，本论文讨论了直接影响 ASC 系统性能的三个主要因素。首先，本文通过使用多个谱图（log-mel、Gamma 和 CQT）进行低级特征提取来探索输入特征，以解决输入特征的判别性或描述性不足的问题。接下来，介绍一种新颖的编码器网络架构。编码器首先将每个低级频谱图转换为高级中间特征或嵌入，从而将这些高级特征组合起来形成一个非常独特的复合特征。然后在分类性能方面探索复合或组合特征，使用不同的解码器，如随机森林 (RF)、多层感知 (MLP) 和专家混合 (MoE)。通过使用这种编码器-解码器框架，它有助于降低使用多个频谱图输入的 ASC 系统中参考过程的计算成本。由于所提出的应用于一般 ASC 任务的技术被证明是非常有效的，这激发了对特定现实生活应用的应用。这就是 2017 年生物医学健康信息学内部会议 (ICBHI) 呼吸声音数据集。在提议的 ASC 框架的基础上，ICBHI 任务使用深度学习框架进行处理，结果系统显示出能够检测呼吸异常周期和疾病。

更新日期：2021-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文