当前位置: X-MOL 学术Interact. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmarking Human Performance in Semi-Automated Image Segmentation
Interacting with Computers ( IF 1.3 ) Pub Date : 2020-08-11 , DOI: 10.1093/iwcomp/iwaa017
Mark Eramian 1 , Christopher Power 2 , Stephen Rau 1 , Pulkit Khandelwal 3
Affiliation  

Semi-automated segmentation algorithms hold promise for improving extraction and identification of objects in images such as tumors in medical images of human tissue, counting plants or flowers for crop yield prediction or other tasks where object numbers and appearance vary from image to image. By blending markup from human annotators to algorithmic classifiers, the accuracy and reproducability of image segmentation can be raised to very high levels. At least, that is the promise of this approach, but the reality is less than clear. In this paper, we review the state-of-the-art in semi-automated image segmentation performance assessment and demonstrate it to be lacking the level of experimental rigour needed to ensure that claims about algorithm accuracy and reproducability can be considered valid. We follow this review with two experiments that vary the type of markup that annotators make on images, either points or strokes, in tightly controlled experimental conditions in order to investigate the effect that this one particular source of variation has on the accuracy of these types of systems. In both experiments, we found that accuracy substantially increases when participants use a stroke-based interaction. In light of these results, the validity of claims about algorithm performance are brought into sharp focus, and we reflect on the need for a far more control on variables for benchmarking the impact of annotators and their context on these types of systems.

中文翻译:

在半自动图像分割中对人类性能进行基准测试

半自动分割算法有望改善图像中对象的提取和识别,例如人体组织医学图像中的肿瘤,计数植物或花朵以预测作物产量或其他任务,其中对象数量和外观因图像而异。通过将标记从人类注释者混合到算法分类器,可以将图像分割的准确性和可重复性提高到非常高的水平。至少,这是这种方法的承诺,但实际情况尚不清楚。在本文中,我们回顾了半自动图像分割性能评估中的最新技术,并证明其缺乏实验严格性水平,无法确保对算法准确性和可再现性的主张可以被认为是有效的。我们将通过两项实验来对此进行回顾,这些实验在严格控制的实验条件下改变了注释器在图像上标记的类型(点或笔画),以研究这一特定变化源对这些类型的准确性的影响。系统。在两个实验中,我们发现当参与者使用基于笔画的交互时,准确性会大大提高。根据这些结果,关于算法性能的声明的有效性成为人们关注的焦点,并且我们认为有必要对变量进行更多控制,以对注释器及其上下文对这些类型的系统的影响进行基准测试。在严密控制的实验条件下,以研究这一特定变化源对这些类型的系统的准确性的影响。在两个实验中,我们发现当参与者使用基于笔画的交互时,准确性会大大提高。根据这些结果,关于算法性能的声明的有效性成为人们关注的焦点,并且我们认为有必要对变量进行更多控制,以对注释器及其上下文对这些类型的系统的影响进行基准测试。在严密控制的实验条件下,以研究这一特定变化源对这些类型的系统的准确性的影响。在两个实验中,我们发现当参与者使用基于笔画的交互时,准确性会大大提高。根据这些结果,关于算法性能的声明的有效性成为人们关注的焦点,并且我们认为有必要对变量进行更多控制,以对注释器及其上下文对这些类型的系统的影响进行基准测试。
更新日期:2020-08-11
down
wechat
bug