Two-level fusion-based acoustic scene classification,Applied Acoustics

当前位置： X-MOL 学术 › Appl. Acoust. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-level fusion-based acoustic scene classification
Applied Acoustics ( IF 3.4 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.apacoust.2020.107502
Shefali Waldekar , Goutam Saha

Abstract Growing demands from applications like surveillance, archiving, and context-aware devices have fuelled research towards efficient extraction of useful information from environmental sounds. Assigning a textual label to an audio segment based on the general characteristics of locations or situations is dealt with in acoustic scene classification (ASC). Because of the different nature of audio scenes, a single feature-classifier pair may not efficiently discriminate among environments. Also, the acoustic scenes might vary with the problem under investigation. However, for most of the ASC applications, rather than giving explicit scene labels (like home, park, etc.) a general estimate of the type of surroundings (e.g., indoor or outdoor) might be enough. In this paper, we propose a two-level hierarchical framework for ASC wherein finer labels follow coarse classification. At the first level, texture features extracted from time–frequency representation of the audio samples are used to generate the coarse labels. The system then explores combinations of six well-known spectral features, successfully used in different audio processing fields for second level classification to give finer details of the audio scene. The performance of the proposed system is compared with baseline methods using detection and classification of acoustic scenes and events (DCASE, 2016 and 2017) ASC databases, and found to be superior in terms of classification accuracy. Additionally, the proposed hierarchical method provides important intermediate results as coarse labels that may be useful in certain applications.

中文翻译：

基于两级融合的声学场景分类

摘要监视、存档和上下文感知设备等应用程序的需求不断增长，推动了从环境声音中有效提取有用信息的研究。在声场景分类 (ASC) 中处理基于位置或情况的一般特征为音频段分配文本标签。由于音频场景的不同性质，单个特征分类器对可能无法有效区分环境。此外，声学场景可能会因调查的问题而异。然而，对于大多数 ASC 应用程序，与其给出明确的场景标签（如家、公园等），对环境类型（例如，室内或室外）的一般估计可能就足够了。在本文中，我们为 ASC 提出了一个两级分层框架，其中更精细的标签遵循粗分类。在第一级，从音频样本的时频表示中提取的纹理特征用于生成粗标签。然后，该系统探索六个众所周知的频谱特征的组合，成功地用于不同的音频处理领域进行二级分类，以提供音频场景的更精细细节。将所提出的系统的性能与使用声学场景和事件（DCASE，2016 和 2017）ASC 数据库的检测和分类的基线方法进行了比较，发现在分类精度方面具有优越性。此外，所提出的分层方法提供了重要的中间结果作为粗标签，可能在某些应用中有用。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>