当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble of convolutional neural networks to improve animal audio classification
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2020-05-26 , DOI: 10.1186/s13636-020-00175-3
Loris Nanni , Yandre M. G. Costa , Rafael L. Aguiar , Rafael B. Mangolin , Sheryl Brahnam , Carlos N. Silla

In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of producing better classification accuracy than other state-of-the-art approaches without ad hoc parameter optimization. We present an ensemble of classifiers that performs competitively on different types of animal audio datasets using the same set of classifiers and parameter settings. To produce this general-purpose ensemble, we ran a large number of experiments that fine-tuned pretrained convolutional neural networks (CNNs) for different audio classification tasks (bird, bat, and whale audio datasets). Six different CNNs were tested, compared, and combined. Moreover, a further CNN, trained from scratch, was tested and combined with the fine-tuned CNNs. To the best of our knowledge, this is the largest study on CNNs in animal audio classification. Our results show that several CNNs can be fine-tuned and fused for robust and generalizable audio classification. Finally, the ensemble of CNNs is combined with handcrafted texture descriptors obtained from spectrograms for further improvement of performance. The MATLAB code used in our experiments will be provided to other researchers for future comparisons at https://github.com/LorisNanni .

中文翻译:

用于改进动物音频分类的卷积神经网络集合

在这项工作中,我们提出了一个用于自动音频分类的集成,该集成融合了从音频文件中提取的不同类型的特征。这些特征经过评估、比较和融合,目的是在没有特别参数优化的情况下产生比其他最先进方法更好的分类精度。我们展示了一组分类器,它们使用相同的分类器和参数设置在不同类型的动物音频数据集上进行竞争。为了产生这个通用集成,我们进行了大量实验,针对不同的音频分类任务(鸟类、蝙蝠和鲸鱼音频数据集)对预训练的卷积神经网络 (CNN) 进行了微调。测试、比较和组合了六种不同的 CNN。此外,还有一个从头开始训练的 CNN,经过测试并与微调的 CNN 相结合。据我们所知,这是对动物音频分类中 CNN 的最大研究。我们的结果表明,可以对多个 CNN 进行微调和融合,以实现稳健且可泛化的音频分类。最后,将 CNN 的集成与从频谱图中获得的手工纹理描述符相结合,以进一步提高性能。我们实验中使用的 MATLAB 代码将在 https://github.com/LorisNanni 上提供给其他研究人员以供将来进行比较。CNN 的集成与从频谱图中获得的手工纹理描述符相结合,以进一步提高性能。我们实验中使用的 MATLAB 代码将在 https://github.com/LorisNanni 上提供给其他研究人员以供将来进行比较。CNN 的集成与从频谱图中获得的手工纹理描述符相结合,以进一步提高性能。我们实验中使用的 MATLAB 代码将在 https://github.com/LorisNanni 上提供给其他研究人员以供将来进行比较。
更新日期:2020-05-26
down
wechat
bug