RootPainter: deep learning segmentation of biological images with corrective annotation,New Phytologist

当前位置： X-MOL 学术 › New Phytol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

RootPainter: deep learning segmentation of biological images with corrective annotation
New Phytologist ( IF 8.3 ) Pub Date : 2022-07-18 , DOI: 10.1111/nph.18387
Abraham George Smith _{1,

2} , Eusun Han _{1,

3} , Jens Petersen ₂ , Niels Alvin Faircloth Olsen ₁ , Christian Giese ₄ , Miriam Athmann ₅ , Dorte Bodin Dresbøll ₁ , Kristian Thorup-Kristensen ₁

Affiliation

Introduction

Plant research is important because we need to find ways to feed a growing population whilst limiting damage to the environment (Lynch, 2007). Plant studies often involve the measurement of traits from images, which may be used in phenotyping for genome-wide association studies (Rebolledo et al., 2016), comparing cultivars for traditional breeding (Walter et al., 2019), or testing a hypothesis related to plant physiology (Rasmussen et al., 2020). Plant image analysis has been identified as a bottleneck in plant research (Minervini et al., 2015). A variety of software exists to quantify plant images (Lobet et al., 2013) but is typically limited to a specific type of data or task, such as leaf counting (Ubbens et al., 2018), pollen counting (Tello et al., 2018), or root architecture extraction (Yasrab et al., 2019).

Convolutional neural networks (CNNs) are a class of deep-learning models that represent the state of the art for image analysis and are currently among the most popular methods in computer vision research. They are a type of multilayered neural network that uses convolution in at least one layer and are designed to process grid-like data, such as images (LeCun et al., 2015). CNNs, such as U-Net (Ronneberger et al., 2015), receive an image as input and then output another image, with each pixel in the output image representing a prediction for each pixel in the input image. CNNs excel at tasks such as segmentation and classification. They have been found to be effective for various tasks in agricultural research (Kamilaris & Prenafeta-Boldú, 2018; Santos et al., 2020) and plant image analysis, including plant stress phenotyping (Jiang & Li, 2020), wheat spike counting (Pound et al., 2017), leaf counting (Ubbens & Stavness, 2017), and accession classification (Taghavi Namin et al., 2018).

For a CNN model to perform a particular task, it must be trained on a suitable dataset of examples. These examples are referred to as training data and are typically a collection of input images paired with the desired output for that image. In the case of segmentation, each input image is paired with a set of labels corresponding to each of the pixels in the input image. The process of creating such labelled training data is referred to as annotation, and this can be time consuming, as annotation of complex images can be labour intensive and many images may be desired, as larger training datasets typically result in improvements in trained model performance (Nakkiran et al., 2021).

Convolutional neural network model training involves a process called stochastic gradient descent (SGD) optimizing the parameters of a model such that the error is reduced. The error, commonly referred to as the loss, is a measure of the difference between the model's predictions and the correct labels for the training data examples. A separate validation dataset, consisting of similar examples, is used to provide information on the performance of the model during the training procedure on examples that have not been used as part of SGD optimization. Validation set performance may be used to decide when to stop training or assist in tuning hyperparameters, which are variables controlling fundamentals of the model that are not directly optimized by SGD.

Developing a CNN-based system for a new image analysis task or dataset is challenging, because dataset design, model training, and hyperparameter tuning are time-consuming tasks requiring competencies in both programming and machine learning.

Three questions that need answering when attempting a supervised learning project such as training a CNN are: how to split the data between training, validation, and test datasets; how to manually annotate or label the data; and how to decide how much data needs to be collected, labelled, and used for training in order to obtain a model with acceptable performance. The choices of optimal hyperparameters and network architecture are also considered to be a ‘black art’ requiring years of experience, and a need has been recognized to make the application of deep learning easier in practice (Smith, 2018).

The question of how much data to use in training and validation is explored in theoretical work that gives indications of a model's generalization performance based on dataset size and number of parameters (Vapnik, 2000). These theoretical insights may be useful for simpler models but provide an inadequate account of the behaviour of CNNs in practice (Zhang et al., 2017).

Manual annotation may be challenging, as proprietary tools may be used that are not freely available (Xu et al., 2019) and can increase the skill set required. Creating dense per-pixel annotations for training is often a time-consuming process. It has been argued that tens of thousands of images are required, making small-scale plant-image datasets unsuitable for training deep-learning models (Ubbens et al., 2018).

The task of collecting datasets for the effective training of models is further confounded by the unique attributes of each dataset. All data are not created equal, with great variability in the utility of each annotated pixel for the model training process (Kellenberger et al., 2019). It may be necessary to add harder examples after observing weaknesses in an initial trained model (Soltaninejad et al., 2019), or to correct for a class imbalance in the data where many examples exist of a majority class (Buda et al., 2018).

Interactive segmentation methods using CNNs (e.g. Hu et al., 2018; Sakinis et al., 2019) provide ways to improve the annotation procedure by allowing user input to be used in the inference process and can be an effective way to create large high-quality datasets in less time (Benenson et al., 2019).

When used in a semi-automatic setting, such tools will speed up the labelling process but may still be unsuitable for situations where the speed and consistency of a fully automated solution are required. For example, when processing data from large-scale root phenotyping facilities (e.g. Svane et al., 2019) where in the order of 100 000 images or more need to be analysed.

In this study we present and evaluate our software RootPainter, which makes the process of creating a dataset, training a neural network, and using it for plant image analysis accessible to ordinary computer users by facilitating all required operations with a cross-platform, open-source, and freely available user interface. The RootPainter software was initially developed for quantification of roots in images from rhizotron-based root studies. However, we found its versatility to be much broader, with an ability to be trained to recognize many different types of structures in a set of images.

Although more root specific (Smith et al., 2020d; Gaggion et al., 2021; Narisetti et al., 2021) and more generalist segmentation tools such as Fiji (Schindelin et al., 2012) via DeepImageJ (Gómez-de Mariscal et al., 2021) make it possible to run trained deep-learning models for segmentation, they do not provide easy-to-use model training functionality, which is the purpose of the RootPainter software presented.

RootPainter allows a user to inspect model performance during the annotation process so they can make a more informed decision about how much and what data are necessary to label in order to train a model to an acceptable accuracy. It allows annotations to be targeted towards areas where the current model shows weakness (Fig. 1) in order to streamline the process of creating a dataset necessary to achieve a desired level of performance. RootPainter can operate in a semi-automatic way, with a user assigning corrections to each segmented image, whilst the model learns from the assigned corrections, reducing the time-requirements for each image as the process is continued. It can also operate in a fully automatic way by either using the model generated from the interactive procedure to process a larger dataset without required interaction, or in a more classical way by using a model trained from dense per-pixel annotations which can also be created via the user interface.

Details are in the caption following the image — **Fig. 1**
Open in figure viewerPowerPoint

RootPainter corrective annotation concept. (a) Roots in soil. (b) Artificial intelligence (AI) root predictions (segmentation) shown in bright blue overlaid on photograph. (c) Human corrections of the initial segmentation, with corrections of false negatives shown in red and corrections of false positives shown in green. (d) After a period of training the AI learns from the corrections provided. The updated segmentation is shown in bright blue.

We evaluate the effectiveness of RootPainter by training models for three different types of data and tasks without dataset-specific programming or hyperparameter tuning. We evaluate the effectiveness on a set of rhizotron root images and, in order to evaluate the versatility of the system, on two other types of data, both involving objects in the images quite different from roots: a biopores dataset and a legume root nodules dataset.

For each dataset we compare the performance of models trained using the dense and corrective annotation strategies on images not used during the training procedure. If annotation is too time consuming, then RootPainter will be unfeasible for many projects. To investigate the possibility of rapid and convenient model training we use no prior knowledge and restrict annotation time to a maximum of 2 h for each model. We make two hypotheses. First, in a limited time period, RootPainter will be able to segment the objects of interest to an acceptable accuracy in three datasets including roots, biopores, and root nodules, demonstrated by a strong correlation between the measurements obtained from RootPainter and manual methods. Second, a corrective annotation strategy will result in a more accurate model compared with dense annotations, given the same time for annotation.

Training with corrective annotation is a type of interactive machine learning, as it uses a human in the loop in the model training procedure. As opposed to active learning, which involves the learner automatically selecting which examples the user labels (Settles, 2009), interactive machine learning involves a human deciding which examples should be added for future iterations of training (Amershi et al., 2014).

Prior work for interactive training for segmentation includes Gonda et al. (2017) and Kontogianni et al. (2019). Gonda et al. (2017) evaluated their method using neuronal structures captured using electron microscopy and found the interactively trained model produced better segmentations than a model trained using exhaustive ground-truth labels.

Kontogianni et al. (2019) combined interactive segmentation with interactive training by using the user feedback in model updates. Their training approach requires an initial dataset with full ground-truth segmentations, whereas our method requires no prior labelled data, which was a design choice we made to increase the applicability of our method to plant researchers looking to quantify new objects in a captured image dataset.

As opposed to Gonda et al. (2017) we use a more modern, fully convolutional network model, which we expect to provide substantial efficiency benefits when dealing with larger images. Our work is novel, in that we evaluate an interactive corrective annotation procedure in terms of annotation time to reach a certain accuracy on real-world plant-image datasets. Synthetic data are often used to evaluate interactive segmentation methods (Benard & Gygli, 2017; Li et al., 2018; Mahadevan et al., 2018). To provide more realistic measurements of annotation time we use real human annotators for our experiments. As opposed to many competing deep-learning methods for segmentation, we provide a graphical user interface that allows all operations to be completed using a user interface, an essential feature for ensuring uptake in the plant image analysis community.

Roots in soil

Plant roots are responsible for uptake of water and nutrients. This makes understanding root system development critical for the development of resource-efficient crop production systems. For this purpose, we need to study roots under real-life conditions in the field, studying the effects of crop genotypes and their management (Rasmussen et al., 2015; Rasmussen & Thorup-Kristensen, 2016), cover crops (Thorup-Kristensen, 2001), crop rotation (Thorup-Kristensen et al., 2012), and other factors. We need to study deep rooting, as this is critical for the use of agriculturally important resources, such as water and nitrogen (N) (Thorup-Kristensen & Kirkegaard, 2016; Thorup-Kristensen et al., 2020a).

Rhizotron-based root research is an important example of plant research. Acquisition of root images from rhizotrons is widely adopted (Rewald et al., 2012), as it allows repeated and nondestructive quantification of root growth and often to the full depth of the root systems. Traditionally, the method for root quantification in such studies involves a lengthy procedure to determine the root density on acquired images by counting intersections with grid lines (Thorup-Kristensen, 2006).

Manual methods require substantial resources and can introduce undesired inter-annotator variation on root density; therefore, a faster and more consistent method is required. More recently, fully automatic approaches using CNNs have been proposed (Smith et al., 2020d); although effective, such methods may be challenging to repurpose to different datasets for root scientists without the required programming expertise. A method that made the retraining process more accessible and convenient would accelerate the adoption of CNNs within the root research community.

Biopores

Biopores are tubular or round-shaped continuous voids formed by root penetration and earthworm movement (Kautz, 2015). They function as preferential pathways for root growth (Han et al., 2015b) and are therefore important for plant resource acquisition (Kopke et al., 2015; Han et al., 2017). Investigation of soil biopores is often done by manually drawing on transparent sheets on an excavated soil surface (Han et al., 2015a). This manual approach is time consuming and precludes a more in-depth analysis of detailed information, including diameter, surface area, or distribution patterns such as clustering.

Root nodules

Growing legumes with N-fixing capacity reduces the use of fertilizer (Kessel et al., 2000); hence, there is an increased demand for legume-involved intercropping (Hauggaard-Nielsen et al., 2001) and precropping for carryover effects. Roots of legumes form associations with rhizobia, forming nodules on the roots, where the N fixation occurs. Understanding the nodulation process is important to understand this symbiosis and the N fixation. However, counting nodules from the excavated roots is a cumbersome and time-consuming procedure, especially for species with many small nodules, such as clovers (Trifolium spp.).

中文翻译：

RootPainter：深度学习生物图像分割与校正注释

介绍

植物研究很重要，因为我们需要找到养活不断增长的人口的方法，同时限制对环境的破坏（Lynch， 2007 年）。植物研究通常涉及从图像测量性状，这可用于全基因组关联研究的表型分析（Rebolledo等人， 2016 年）、比较品种以进行传统育种（Walter等人， 2019 年）或检验假设与植物生理学相关 (Rasmussen et al ., 2020 )。植物图像分析已被确定为植物研究的瓶颈 (Minervini et al ., 2015 )。存在多种软件来量化植物图像（Lobet等人， 2013 年），但通常仅限于特定类型的数据或任务，例如叶计数（Ubbens等人， 2018 年）、花粉计数（Tello等人，2018 年）或根结构提取（Yasrab等人）等人， 2019 年）。

卷积神经网络 (CNN) 是一类深度学习模型，代表了图像分析的最新技术水平，是目前计算机视觉研究中最流行的方法之一。它们是一种多层神经网络，至少在一层中使用卷积，旨在处理类似网格的数据，例如图像（LeCun等人， 2015 年）。CNN，例如 U-Net (Ronneberger et al ., 2015)，接收一幅图像作为输入，然后输出另一幅图像，输出图像中的每个像素代表对输入图像中每个像素的预测。CNN 擅长分割和分类等任务。已发现它们可有效用于农业研究（Kamilaris & Prenafeta-Boldú， 2018 年；Santos等人， 2020 年）和植物图像分析，包括植物胁迫表型分析（Jiang & Li， 2020 年）、小麦穗数计数（ Pound等人， 2017 年）、叶计数（Ubbens & Stavness， 2017 年）和入藏分类（Taghavi Namin等人， 2018 年）。

对于执行特定任务的 CNN 模型，必须在合适的示例数据集上对其进行训练。这些示例称为训练数据，通常是输入图像的集合，这些输入图像与该图像的所需输出配对。在分割的情况下，每个输入图像都与一组对应于输入图像中每个像素的标签配对。创建此类标记训练数据的过程称为注释，这可能很耗时，因为复杂图像的注释可能需要大量劳动，并且可能需要许多图像，因为较大的训练数据集通常会提高训练模型的性能（ Nakkiran等人， 2021 年）。

卷积神经网络模型训练涉及一个称为随机梯度下降 (SGD) 的过程，该过程优化模型的参数以减少误差。错误，通常称为损失，是模型预测与训练数据示例的正确标签之间差异的度量。由类似示例组成的单独验证数据集用于在训练过程中针对未用作 SGD 优化一部分的示例提供有关模型性能的信息。验证集性能可用于决定何时停止训练或协助调整超参数，超参数是控制模型基本原理的变量，未由 SGD 直接优化。

为新的图像分析任务或数据集开发基于 CNN 的系统具有挑战性，因为数据集设计、模型训练和超参数调整是耗时的任务，需要编程和机器学习能力。

在尝试训练 CNN 等监督学习项目时需要回答的三个问题是：如何在训练、验证和测试数据集之间拆分数据；如何手动注释或标记数据；以及如何决定需要收集、标记和使用多少数据来训练以获得具有可接受性能的模型。最佳超参数和网络架构的选择也被认为是需要多年经验的“魔法”，并且已经认识到需要使深度学习的应用在实践中更容易（Smith， 2018）。

理论工作探讨了在训练和验证中使用多少数据的问题，该工作根据数据集大小和参数数量给出了模型泛化性能的指示 (Vapnik, 2000 )。这些理论见解可能对更简单的模型有用，但无法充分说明 CNN 在实践中的行为（Zhang等人， 2017 年）。

手动注释可能具有挑战性，因为可能会使用非免费提供的专有工具（Xu等人， 2019 年）并且可以增加所需的技能组合。为训练创建密集的每像素注释通常是一个耗时的过程。有人认为需要数万张图像，这使得小规模植物图像数据集不适合训练深度学习模型（Ubbens等人， 2018 年）。

收集数据集以有效训练模型的任务因每个数据集的独特属性而进一步混淆。并非所有数据都是平等创建的，每个带注释的像素在模型训练过程中的效用存在很大差异（Kellenberger等人， 2019 年）。在观察到初始训练模型的弱点后可能需要添加更难的示例（Soltaninejad等人， 2019 年），或者纠正数据中存在多数类示例的类别不平衡（Buda等人， 2018 年） ).

使用 CNN 的交互式分割方法（例如 Hu等人， 2018 年；Sakinis等人， 2019 年）提供了通过允许在推理过程中使用用户输入来改进注释过程的方法，并且可以成为创建大型高清晰图像的有效方法。在更短的时间内获得高质量的数据集（Benenson等人， 2019 年）。

当在半自动设置中使用时，此类工具将加快标签过程，但可能仍然不适用于需要全自动解决方案的速度和一致性的情况。例如，在处理来自大规模根表型设施（例如 Svane等人， 2019 年）的数据时，需要分析大约 100,000 张或更多图像。

在这项研究中，我们展示并评估了我们的软件Root Painter ，它通过跨平台促进所有必需的操作，使普通计算机用户可以访问创建数据集、训练神经网络并将其用于植物图像分析的过程、开源且可免费使用的用户界面。Root Painter软件最初是为量化基于根际根研究的图像中的根而开发的。然而，我们发现它的多功能性要广泛得多，能够接受训练以识别一组图像中的许多不同类型的结构。

尽管更具体的根（Smith等人， 2020d；Gaggion等人， 2021 年；Narisetti等人， 2021 年）和更通用的分割工具，例如 F iji（Schindelin等人， 2012 年）通过 D eep I mage J（ Gómez-de Mariscal等人， 2021 年）使运行经过训练的深度学习模型进行分割成为可能，但它们不提供易于使用的模型训练功能，而这正是提出的Root Painter软件的目的。

Root Painter允许用户在注释过程中检查模型性能，以便他们可以更明智地决定需要标记多少数据和哪些数据，以便将模型训练到可接受的精度。它允许注释针对当前模型显示弱点的区域（图 1），以简化创建实现所需性能水平所需的数据集的过程。根画家_ _可以以半自动方式操作，用户为每个分割图像分配校正，同时模型从分配的校正中学习，随着过程的继续减少每个图像的时间要求。它还可以通过使用从交互式过程生成的模型来处理更大的数据集而无需交互，或者以更经典的方式使用从也可以创建的密集的每像素注释训练的模型，以全自动方式运行通过用户界面。

详细信息在图片后面的标题中 — **图。1**
在图窗查看器中打开微软幻灯片软件

Root P ainter更正注释概念。(a) 土壤中的根。(b) 人工智能 (AI) 根预测（分割）以亮蓝色显示在照片上。(c) 初始分割的人工校正，红色显示的假阴性校正和绿色显示的假阳性校正。(d) 经过一段时间的训练后，AI 从提供的修正中学习。更新后的分割显示为亮蓝色。

我们通过针对三种不同类型的数据和任务训练模型来评估 Root Painter的有效性，而无需特定于数据集的编程或超参数调整。我们评估了一组 rhizotron 根图像的有效性，为了评估系统的多功能性，我们还评估了另外两种类型的数据，这两种数据都涉及图像中与根完全不同的对象：生物孔数据集和豆科植物根瘤数据集.

对于每个数据集，我们比较了在训练过程中未使用的图像上使用密集和校正注释策略训练的模型的性能。如果annotation太耗时，那么Root Painter对于很多项目来说是行不通的。为了研究快速方便的模型训练的可能性，我们不使用先验知识并将每个模型的注释时间限制为最多 2 小时。我们做两个假设。首先，在有限的时间内，Root P ainter将能够在包括根、生物孔和根瘤在内的三个数据集中以可接受的精度分割感兴趣的对象，从 Root P 获得的测量值之间的强相关性证明了这一点交互和手动方法。其次，在给定相同的注释时间的情况下，与密集注释相比，纠正性注释策略将产生更准确的模型。

带有纠正注释的训练是一种交互式机器学习，因为它在模型训练过程中使用了一个循环中的人。与主动学习不同，后者涉及学习者自动选择用户标记的示例（Settles， 2009 年），交互式机器学习涉及人类决定应为未来的训练迭代添加哪些示例（Amershi等人， 2014 年）。

先前的分割交互式训练工作包括 Gonda等人。( 2017 ) 和 Kontogianni等人。（2019）。贡达等人。( 2017 ) 使用电子显微镜捕获的神经元结构评估了他们的方法，发现交互式训练的模型比使用详尽的地面实况标签训练的模型产生更好的分割。

Kontogianni等人。( 2019 ) 通过在模型更新中使用用户反馈，将交互式分割与交互式训练相结合。他们的训练方法需要一个具有完整地面真实分割的初始数据集，而我们的方法不需要事先标记的数据，这是我们做出的设计选择，以增加我们的方法对植物研究人员的适用性，这些研究人员希望在捕获的图像数据集中量化新对象.

与 Gonda等人相反。( 2017 ) 我们使用更现代的全卷积网络模型，我们期望它在处理更大的图像时提供显着的效率优势。我们的工作很新颖，因为我们根据注释时间评估交互式校正注释程序，以在真实世界的植物图像数据集上达到一定的准确性。合成数据通常用于评估交互式分割方法（Benard & Gygli， 2017 年；Li等人， 2018 年；Mahadevan等人，2018 年）). 为了提供更真实的注释时间测量，我们在实验中使用了真实的人类注释者。与许多用于分割的竞争性深度学习方法相反，我们提供了一个图形用户界面，允许使用用户界面完成所有操作，这是确保植物图像分析社区采用的基本特征。

扎根于土壤

植物根部负责吸收水分和养分。这使得了解根系发育对于开发资源节约型作物生产系统至关重要。为此，我们需要在田间真实条件下研究根系，研究作物基因型的影响及其管理（Rasmussen等人， 2015 年；Rasmussen 和 Thorup-Kristensen， 2016 年）、覆盖作物（Thorup-Kristensen , 2001 ), 作物轮作 (Thorup-Kristensen et al ., 2012 ), 和其他因素。我们需要研究深根，因为这对于水和氮 (N) 等重要农业资源的利用至关重要（Thorup-Kristensen & Kirkegaard， 2016 年）; Thorup-Kristensen等人， 2020a）。

基于根管的根研究是植物研究的一个重要例子。从根茎中采集根图像被广泛采用（Rewald等人， 2012 年），因为它允许对根系生长进行重复和非破坏性量化，并且通常对根系的整个深度进行量化。传统上，此类研究中的根系量化方法涉及一个冗长的过程，通过计算与网格线的交叉点来确定所采集图像上的根系密度 (Thorup-Kristensen, 2006 )。

手动方法需要大量资源，并且可能会在根密度上引入不需要的注释器间变化；因此，需要一种更快、更一致的方法。最近，已经提出了使用 CNN 的全自动方法（Smith等人， 2020d）；尽管有效，但如果没有所需的编程专业知识，这些方法可能很难重新用于根科学家的不同数据集。一种使再培训过程更易于访问和方便的方法将加速 CNN 在根研究社区中的采用。

生物孔

生物孔是由根系穿透和蚯蚓运动形成的管状或圆形连续空隙 (Kautz, 2015 )。它们作为根系生长的优先途径发挥作用 (Han et al ., 2015b )，因此对于植物资源获取很重要 (Kopke et al ., 2015 ; Han et al ., 2017 )。土壤生物孔的调查通常是通过在挖掘出的土壤表面上的透明薄片上手动绘制来完成的（Han et al ., 2015a）。这种手动方法非常耗时，并且无法对详细信息进行更深入的分析，包括直径、表面积或分布模式（例如聚类）。

根瘤

种植具有固氮能力的豆科植物可减少肥料的使用（Kessel等人， 2000 年）；因此，对涉及豆科植物的间作（Hauggaard-Nielsen等人， 2001 年）和因结转效应而进行的预作的需求增加。豆科植物的根与根瘤菌结合，在根部形成根瘤，这是固氮发生的地方。了解结瘤过程对于了解这种共生和 N 固定很重要。然而，从挖掘出的根中计算根瘤是一个繁琐且耗时的过程，特别是对于具有许多小根瘤的物种，例如三叶草（Trifolium spp. ）。

更新日期：2022-07-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11