Biological data annotation via a human-augmenting AI-based labeling system,npj Digital Medicine

当前位置： X-MOL 学术 › npj Digit. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Biological data annotation via a human-augmenting AI-based labeling system
npj Digital Medicine ( IF 12.4 ) Pub Date : 2021-10-07 , DOI: 10.1038/s41746-021-00520-6
Douwe van der Wal ₁ , Iny Jhun ₂ , Israa Laklouk ₃ , Jeff Nirschl ₂ , Lara Richer ₃ , Rebecca Rojansky ₂ , Talent Theparee ₃ , Joshua Wheeler ₂ , Jörg Sander ₄ , Felix Feng ₃ , Osama Mohamad ₃ , Silvio Savarese ₁ , Richard Socher ₁ , Andre Esteva ₁

Affiliation

Biology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.

中文翻译：

通过基于人工智能的人工增强标签系统进行生物数据注释

生物学已成为部署深度学习和人工智能 (AI) 的主要领域，这在很大程度上得益于该领域可以生成的海量数据集。大多数 AI 任务的关键是要有足够大的标记数据集来训练 AI 模型。在显微镜的背景下，很容易生成包含数百万个细胞和结构的图像数据集。然而，为人工智能模型获得大规模高质量的注释是具有挑战性的。在这里，我们提出了 HALS（人类增强标签系统），这是一种人在环数据标记 AI，它从未初始化的情况下开始并实时学习人类的注释。使用由三个深度学习模型组成的多部分 AI，HALS 仅从几个示例中学习，并立即减少了注释者的工作量，同时提高注释的质量。使用高度重复的用例 - 注释细胞类型 - 并与七位病理学家 - 生物标本显微分析专家进行实验 - 我们证明手动工作减少了 90.60%，平均数据质量提高了 4.34%，测量跨越四个用例和两种组织染色类型。

更新日期：2021-10-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文