当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multispecies bioacoustic classification using transfer learning of deep convolutional neural networks with pseudo-labeling
Applied Acoustics ( IF 3.4 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.apacoust.2020.107375
Ming Zhong , Jack LeBien , Marconi Campos-Cerqueira , Rahul Dodhia , Juan Lavista Ferres , Julian P. Velev , T. Mitchell Aide

Abstract In this study, we evaluated deep convolutional neural networks for classifying the calls of 24 birds and amphibian species detected in ambient field recordings from the tropical mountains of Puerto Rico. Training data were collected using a template-based detection algorithm followed by a manual validation process. As preparing sufficient training data is a major challenge for many deep learning applications, we propose a novel approach that combines transfer learning of a pre-trained deep convolutional neural network (CNN) model and a semi-supervised pseudo-labeling method with a custom loss function to meet this challenge. Our proposed methodology enables the network to be trained in a supervised fashion with labeled and unlabeled data simultaneously, which effectively increases the size of training set and thus boosts the model performance. In classifying a test set of manually validated positive and negative template-based detections, our proposed model achieves 97.7% sensitivity (true positive rate), 96.4% specificity (true negative rate) and 99.5% Area Under a Curve (AUC). This multi-label multi-species classification methodology and its framework can be easily adopted by other acoustic classification problems.



摘要 在这项研究中,我们评估了深度卷积神经网络,用于对在波多黎各热带山脉的环境现场记录中检测到的 24 种鸟类和两栖动物的叫声进行分类。使用基于模板的检测算法和手动验证过程收集训练数据。由于准备足够的训练数据是许多深度学习应用程序的主要挑战,因此我们提出了一种新方法,该方法将预训练的深度卷积神经网络 (CNN) 模型的迁移学习和具有自定义损失的半监督伪标记方法相结合功能来应对这一挑战。我们提出的方法使网络能够以有监督的方式同时使用标记和未标记的数据进行训练,这有效地增加了训练集的大小,从而提高了模型性能。在对手动验证的基于模板的阳性和阴性检测的测试集进行分类时,我们提出的模型实现了 97.7% 的灵敏度(真阳性率)、96.4% 的特异性(真阴性率)和 99.5% 的曲线下面积 (AUC)。这种多标签多物种分类方法及其框架可以很容易地被其他声学分类问题采用。