当前位置: X-MOL 学术Ecol. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BirdNET: A deep learning solution for avian diversity monitoring
Ecological Informatics ( IF 5.8 ) Pub Date : 2021-01-27 , DOI: 10.1016/j.ecoinf.2021.101236
Stefan Kahl , Connor M. Wood , Maximilian Eibl , Holger Klinck

Variation in avian diversity in space and time is commonly used as a metric to assess environmental changes. Conventionally, such data were collected by expert observers, but passively collected acoustic data is rapidly emerging as an alternative survey technique. However, efficiently extracting accurate species richness data from large audio datasets has proven challenging. Recent advances in deep artificial neural networks (DNNs) have transformed the field of machine learning, frequently outperforming traditional signal processing techniques in the domain of acoustic event detection and classification. We developed a DNN, called BirdNET, capable of identifying 984 North American and European bird species by sound. Our task-specific model architecture was derived from the family of residual networks (ResNets), consisted of 157 layers with more than 27 million parameters, and was trained using extensive data pre-processing, augmentation, and mixup. We tested the model against three independent datasets: (a) 22,960 single-species recordings; (b) 286 h of fully annotated soundscape data collected by an array of autonomous recording units in a design analogous to what researchers might use to measure avian diversity in a field setting; and (c) 33,670 h of soundscape data from a single high-quality omnidirectional microphone deployed near four eBird hotspots frequented by expert birders. We found that domain-specific data augmentation is key to build models that are robust against high ambient noise levels and can cope with overlapping vocalizations. Task-specific model designs and training regimes for audio event recognition perform on-par with very complex architectures used in other domains (e.g., object detection in images). We also found that high temporal resolution of input spectrograms (short FFT window length) improves the classification performance for bird sounds. In summary, BirdNET achieved a mean average precision of 0.791 for single-species recordings, a F0.5 score of 0.414 for annotated soundscapes, and an average correlation of 0.251 with hotspot observation across 121 species and 4 years of audio data. By enabling the efficient extraction of the vocalizations of many hundreds of bird species from potentially vast amounts of audio data, BirdNET and similar tools have the potential to add tremendous value to existing and future passively collected audio datasets and may transform the field of avian ecology and conservation.



中文翻译:

BirdNET:用于鸟类多样性监测的深度学习解决方案

鸟类在空间和时间上的多样性变化通常被用作评估环境变化的指标。按照惯例,此类数据是由专业观察员收集的,但是作为替代调查技术,被动收集的声学数据正在迅速兴起。但是,从大型音频数据集中有效提取准确的物种丰富度数据已证明具有挑战性。深度人工神经网络(DNN)的最新进展已经改变了机器学习的领域,在声学事件检测和分类领域经常优于传统的信号处理技术。我们开发了一种称为BirdNET的DNN,能够通过声音识别984种北美和欧洲鸟类。我们特定于任务的模型架构源自残差网络(ResNets)系列,它由157层组成,具有超过2700万个参数,并使用了广泛的数据预处理,扩充和混合进行了培训。我们针对三个独立的数据集测试了该模型:(a)22,960种单物种记录;(b)由一组自主记录单元在设计中收集的286 h完全注释的声景数据,类似于研究人员可能在野外测量禽类多样性的设计;(c)在单个观鸟者经常光顾的四个eBird热点附近部署的单个高质量全向麦克风的33,670 h声景数据。我们发现,特定于域的数据增强对于建立可抵抗高环境噪声水平并可以应对重叠发声的模型的关键。音频事件识别的特定于任务的模型设计和训练方案与在其他领域中使用的非常复杂的体系结构(例如,图像中的对象检测)相提并论。我们还发现,输入声谱图的高时间分辨率(较短的FFT窗口长度)可改善鸟声的分类性能。总而言之,BirdNET单种唱片的平均平均精度为0.791,带注释的音景的F0.5分数为0.414,与121个物种和4年音频数据的热点观测的平均相关性为0.251。通过有效地从潜在的大量音频数据中提取数百种鸟类的发声,

更新日期:2021-02-09
down
wechat
bug