当前位置: X-MOL 学术Ecol. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
“How many images do I need?” Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring
Ecological Informatics ( IF 5.1 ) Pub Date : 2020-03-19 , DOI: 10.1016/j.ecoinf.2020.101085
Saleh Shahinfar , Paul Meek , Greg Falzon

Deep learning (DL) algorithms are the state of the art in automated classification of wildlife camera trap images. The challenge is that the ecologist cannot know in advance how many images per species they need to collect for model training in order to achieve their desired classification accuracy. In fact there is limited empirical evidence in the context of camera trapping to demonstrate that increasing sample size will lead to improved accuracy.

In this study we explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes. We also provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori. This will help ecologists for optimal allocation of resources, work and efficient study design.

In order to investigate the effect of number of training images; seven training sets with 10, 20, 50, 150, 500, 1000 images per class were designed. Six deep learning architectures namely ResNet-18, ResNet-50, ResNet-152, DnsNet-121, DnsNet-161, and DnsNet-201 were trained and tested on a common exclusive testing set of 250 images per class. The whole experiment was repeated on three similar datasets from Australia, Africa and North America and the results were compared. Simple regression equations for use by practitioners to approximate model performance metrics are provided. Generalizes additive models (GAM) are shown to be effective in modelling DL performance metrics based on the number of training images per class, tuning scheme and dataset.

Overall, our trained models classified images with 0.94 accuracy (ACC), 0.73 precision (PRC), 0.72 true positive rate (TPR), and 0.03 false positive rate (FPR). Variation in model performance metrics among datasets, species and deep learning architectures exist and are shown distinctively in the discussion section. The ordinary least squares regression models explained 57%, 54%, 52%, and 34% of expected variation of ACC, PRC, TPR, and FPR according to number of images available for training. Generalised additive models explained 77%, 69%, 70%, and 53% of deviance for ACC, PRC, TPR, and FPR respectively.

Predictive models were developed linking number of training images per class, model, dataset to performance metrics. The ordinary least squares regression and Generalised additive models developed provides a practical toolbox to estimate model performance with respect to different numbers of training images.



中文翻译:

“我需要多少张图片?” 了解每个类别的样本量如何影响深度学习模型性能指标,以实现自主野生动植物监测中的均衡设计

深度学习(DL)算法是对野生生物照相机陷阱图像进行自动分类的最新技术。面临的挑战是,生态学家无法预先知道他们需要为模型训练收集每种物种多少个图像才能达到所需的分类精度。实际上,在相机陷印的情况下,经验证据有限,无法证明增加样本量将导致精度提高。

在本研究中,我们深入探讨了随着每类(物种)样本数量的逐渐增加,深度学习模型性能的问题。我们还为生态学家提供了一个近似公式,以估算每个动物物种为达到一定精度水平而需要先验的图像数量。这将帮助生态学家优化资源分配,工作和有效的研究设计。

为了研究训练图像数量的效果;设计了七个训练集,每个班级分别具有10、20、50、150、500、1000张图像。六个深度学习架构,即ResNet-18,ResNet-50,ResNet-152,DnsNet-121,DnsNet-161和DnsNet-201,在每类250个图像的通用排他测试集中进行了培训和测试。在来自澳大利亚,非洲和北美的三个类似数据集上重复了整个实验,并对结果进行了比较。提供了简单的回归方程式,供从业人员用来近似模型性能指标。基于每个类,调整方案和数据集的训练图像数量,通用化加性模型(GAM)被证明可有效地对DL性能指标进行建模。

总体而言,我们训练有素的模型将图像分类为具有0.94精度(ACC),0.73精度(PRC),0.72真阳性率(TPR)和0.03假阳性率(FPR)。数据集,物种和深度学习架构之间的模型性能指标存在差异,并在讨论部分中进行了独特展示。普通最小二乘回归模型根据可用于训练的图像数量,解释了ACC,PRC,TPR和FPR预期变化的57%,54%,52%和34%。通用添加剂模型分别解释了ACC,PRC,TPR和FPR的偏差分别为77%,69%,70%和53%。

开发了预测模型,将每个班级,模型,数据集的训练图像数量与性能指标链接在一起。开发的普通最小二乘回归模型和广义加法模型提供了一个实用的工具箱,可以针对不同数量的训练图像估算模型性能。

更新日期:2020-03-19
down
wechat
bug