当前位置: X-MOL 学术Ecol. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data augmentation approaches for improving animal audio classification
Ecological Informatics ( IF 5.8 ) Pub Date : 2020-03-19 , DOI: 10.1016/j.ecoinf.2020.101084
Loris Nanni , Gianluca Maguolo , Michelangelo Paci

In this paper we present ensembles of classifiers for automated animal audio classification, exploiting different data augmentation techniques for training Convolutional Neural Networks (CNNs). The specific animal audio classification problems are i) birds and ii) cat sounds, whose datasets are freely available. We train five different CNNs on the original datasets and on their versions augmented by four augmentation protocols, working on the raw audio signals or their representations as spectrograms. We compared our best approaches with the state of the art, showing that we obtain the best recognition rate on the same datasets, without ad hoc parameter optimization. Our study shows that different CNNs can be trained for the purpose of animal audio classification and that their fusion works better than the stand-alone classifiers. To the best of our knowledge this is the largest study on data augmentation for CNNs in animal audio classification audio datasets using the same set of classifiers and parameters. Our MATLAB code is available at https://github.com/LorisNanni.



中文翻译:

改善动物音频分类的数据增强方法

在本文中,我们介绍了用于自动动物音频分类的分类器集合,它们利用不同的数据增强技术来训练卷积神经网络(CNN)。具体的动物音频分类问题是i)鸟类和ii)猫的声音,它们的数据集可免费获得。我们在原始数据集上以及在通过四个增强协议增强的版本上训练了五个不同的CNN,它们处理原始音频信号或它们作为频谱图的表示。我们将最佳方法与最新技术进行了比较,结果表明,在没有特定参数优化的情况下,我们在相同数据集上获得了最佳识别率。我们的研究表明,可以针对动物音频分类的目的对不同的CNN进行训练,并且它们的融合要比独立的分类器更好。据我们所知,这是使用相同的一组分类器和参数对动物音频分类音频数据集中的CNN进行数据扩充的最大研究。我们的MATLAB代码可从https://github.com/LorisNanni获得。

更新日期:2020-03-19
down
wechat
bug