当前位置: X-MOL 学术Limnol. Oceanogr. Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi‐ and fully supervised quantification techniques to improve population estimates from machine classifiers
Limnology and Oceanography: Methods ( IF 2.1 ) Pub Date : 2020-10-22 , DOI: 10.1002/lom3.10399
Eric C. Orenstein 1 , Kasia M. Kenitz 1 , Paul L.D. Roberts 2 , Peter J.S. Franks 1 , Jules S. Jaffe 1 , Andrew D. Barton 1, 3
Affiliation  

Modern in situ digital imaging systems collect vast numbers of images of marine organisms and suspended particles. Automated methods to classify objects in these images – largely supervised machine learning techniques – are now used to deal with this onslaught of biological data. Though such techniques can minimize the human cost of analyzing the data, they also have important limitations. In training automated classifiers, we implicitly program them with an inflexible understanding of the environment they are observing. When the relationship between the classifier and the population changes, the computer's performance degrades, potentially decreasing the accuracy of the estimate of community composition. This limitation of automated classifiers is known as “dataset shift.” Here, we describe techniques for addressing dataset shift. We then apply them to the output of a binary deep neural network searching for diatom chains in data generated by the Scripps Plankton Camera System (SPCS) on the Scripps Pier. In particular, we describe a supervised quantification approach to adjust a classifier's output using a small number of human corrected images to estimate the system error in a time frame of interest. This method yielded an 80% improvement in mean absolute error over the raw classifier output on a set of 41 independent samples from the SPCS. The technique can be extended to adjust the output of multi‐category classifiers and other in situ observing systems.

中文翻译:

半监督和全监督量化技术,可改善机器分类器的总体估计

现代的现场数字成像系统收集了大量的海洋生物和悬浮颗粒的图像。现在,在很大程度上受监督的机器学习技术的作用下,对这些图像中的对象进行自动分类的方法已用于处理这种对生物数据的冲击。尽管此类技术可以最大程度地减少分析数据的人工成本,但它们也具有重要的局限性。在训练自动分类器时,我们对它们所观察到的环境缺乏灵活的理解,对它们进行隐式编程。当分类器和总体之间的关系发生变化时,计算机的性能会降低,从而可能降低社区组成估计的准确性。自动分类器的这种局限性称为“数据集移位”。在这里,我们描述了解决数据集偏移的技术。然后,将它们应用于二进制深层神经网络的输出,以搜索Scripps码头上的Scripps浮游生物相机系统(SPCS)生成的数据中的硅藻链。特别是,我们描述了一种监督量化方法,该方法使用少量人工校正图像来调整分类器的输出,以估计感兴趣的时间范围内的系统误差。与来自SPCS的41个独立样本集相比,该方法相对于原始分类器输出,平均绝对误差提高了80%。可以扩展该技术以调整多类别分类器和其他原位观测系统的输出。我们描述了一种监督量化方法,该方法可使用少量人工校正图像来调整分类器的输出,以在感兴趣的时间范围内估计系统误差。与来自SPCS的41个独立样本集相比,该方法相对于原始分类器输出,平均绝对误差提高了80%。可以扩展该技术以调整多类别分类器和其他原位观测系统的输出。我们描述了一种监督量化方法,该方法可使用少量人工校正图像来调整分类器的输出,以在感兴趣的时间范围内估计系统误差。该方法相对于来自SPCS的41个独立样本集上的原始分类器输出,平均绝对误差提高了80%。可以扩展该技术以调整多类别分类器和其他原位观测系统的输出。
更新日期:2020-12-12
down
wechat
bug