A new method to control error rates in automated species identification with deep learning algorithms.,Scientific Reports

当前位置： X-MOL 学术 › Sci. Rep. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A new method to control error rates in automated species identification with deep learning algorithms.
Scientific Reports ( IF 4.6 ) Pub Date : 2020-07-03 , DOI: 10.1038/s41598-020-67573-7
Sébastien Villon _{1,

2} , David Mouillot _{1,

3} , Marc Chaumont _{2,

4} , Gérard Subsol ₂ , Thomas Claverie _{1,

5} , Sébastien Villéger ₁

Affiliation

Processing data from surveys using photos or videos remains a major bottleneck in ecology. Deep Learning Algorithms (DLAs) have been increasingly used to automatically identify organisms on images. However, despite recent advances, it remains difficult to control the error rate of such methods. Here, we proposed a new framework to control the error rate of DLAs. More precisely, for each species, a confidence threshold was automatically computed using a training dataset independent from the one used to train the DLAs. These species-specific thresholds were then used to post-process the outputs of the DLAs, assigning classification scores to each class for a given image including a new class called “unsure”. We applied this framework to a study case identifying 20 fish species from 13,232 underwater images on coral reefs. The overall rate of species misclassification decreased from 22% with the raw DLAs to 2.98% after post-processing using the thresholds defined to minimize the risk of misclassification. This new framework has the potential to unclog the bottleneck of information extraction from massive digital data while ensuring a high level of accuracy in biodiversity assessment.

中文翻译：

一种使用深度学习算法控制物种自动识别错误率的新方法。

使用照片或视频处理来自调查的数据仍然是生态学的主要瓶颈。深度学习算法（DLA）已越来越多地用于自动识别图像上的生物。然而，尽管有最近的进步，但是仍然难以控制这种方法的错误率。在这里，我们提出了一个新的框架来控制DLA的错误率。更准确地说，对于每种物种，使用独立于用于训练DLA的训练数据集的训练数据集自动计算置信度阈值。然后使用这些特定于物种的阈值对DLA的输出进行后处理，为给定图像的每个类别分配分类得分，包括一个称为“不确定”的新类别。我们将此框架应用于一个研究案例，该案例从13232张珊瑚礁水下图像中识别出20种鱼类。使用定义的阈值以最大程度降低误分类的风险，后处理后物种的总体误分类率从原始DLA的22％降低到2.98％。这个新框架有可能消除从大量数字数据中提取信息的瓶颈，同时确保生物多样性评估的高度准确性。

更新日期：2020-07-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>