当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel spiral pattern and 2D M4 pooling based environmental sound classification method
Applied Acoustics ( IF 3.4 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.apacoust.2020.107508
Turker Tuncer , Abdulhamit Subasi , Fatih Ertam , Sengul Dogan

Abstract One of the crucial problems of the signal processing, digital forensics and machine learning is the environmental sound classification (ESC). Several ESC methods have been presented to obtain highly accurate model. In this work, a novel multileveled ESC method is presented. The presented ESC method uses two novel algorithms namely Spiral Pattern and two dimensional maximum, minimum, median and mean (2D-M4) pooling. By using these methods (Spiral Pattern and 2D-M4 pooling), 9 level feature generation approach is presented. Since the proposed Spiral Pattern has nine arrows, it extracts 9 and 18 bits using signum and ternary functions respectively. As a result, 1536 features are extracted in each level and totally 15,360 features are generated using from 0th to 9th levels. In order to select the discriminative features, neighbourhood component analysis (NCA) is used and 700 most distinctive features are selected. In the classification phase, deep neural network is trained and tested with the ESC-10 and ESC-50 datasets. 98.75% and 85.75% average classification accuracies were achieved with 10-folds cross validation for ESC-10 and ESC-50 datasets respectively. The experimental results reveal that the proposed Spiral Pattern and 2D-M4 pooling based ESC method is superior than the human auditory system (HAS) for environmental sound classification.

中文翻译:

一种新的螺旋模式和基于 2D M4 池化的环境声分类方法

摘要 信号处理、数字取证和机器学习的关键问题之一是环境声分类(ESC)。已经提出了几种ESC方法来获得高度准确的模型。在这项工作中,提出了一种新颖的多级 ESC 方法。所提出的 ESC 方法使用两种新颖的算法,即螺旋模式和二维最大值、最小值、中值和平均值 (2D-M4) 池化。通过使用这些方法(螺旋模式和 2D-​​M4 池化),提出了 9 级特征生成方法。由于提议的螺旋模式有九个箭头,它分别使用符号和三元函数提取 9 和 18 位。结果,在每个级别提取了 1536 个特征,并且使用从第 0 到第 9 个级别总共生成了 15,360 个特征。为了选择判别特征,使用邻域成分分析 (NCA) 并选择了 700 个最显着的特征。在分类阶段,使用 ESC-10 和 ESC-50 数据集训练和测试深度神经网络。分别对 ESC-10 和 ESC-50 数据集进行 10 倍交叉验证,平均分类准确率达到 98.75% 和 85.75%。实验结果表明,所提出的基于螺旋模式和 2D-​​M4 池化的 ESC 方法在环境声音分类方面优于人类听觉系统 (HAS)。
更新日期:2020-12-01
down
wechat
bug