On Its Way to Primetime: Artificial Intelligence in Flow Cytometry Diagnostics.,Cytometry Part A

当前位置： X-MOL 学术 › Cytom. Part A › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Its Way to Primetime: Artificial Intelligence in Flow Cytometry Diagnostics.
Cytometry Part A ( IF 2.5 ) Pub Date : 2020-07-20 , DOI: 10.1002/cyto.a.24191
Stefan W Krause ₁

Affiliation

In this issue of Cytometry A, Zhao et al. (page 1073–1080) report on their work to diagnose leukemic B cell non‐Hodgkin's Lymphoma from flow cytometry (FCM) raw data of blood and bone marrow samples using a dedicated computer approach, which would assign one of eight B‐cell lymphoma diagnoses or “normal” to a sample. A remarkable of level of classification performance could be achieved in the validation set. For the “true” classification of B‐cell lymphomas, conventional diagnostics had incorporated morphology, FCM and additional information from histology and genetics if needed, whereas computer diagnosis was derived from FCM data alone. In this context, uncertainty to delineate, for example, monoclonal B‐cell lymphocytosis from chronic lymphocytic leukemia or to subclassify a B‐cell malignancy as either mantle cell lymphoma or prolymphocytic leukemia is not an outright error, but is rather based on the limitations of FCM itself. Furthermore, cell populations tagged as abnormal by the algorithm and color‐coded accordingly in conventional plots can help human diagnosticians to review and fine‐tune the diagnosis. However, some lymphomas (most prominent in follicular lymphoma) were classified as normal by the algorithm. Vice versa, only few samples classified as “normal” by human diagnosticians were classified as lymphoma by the algorithm. Thus, a deficit in sensitivity exists, which is clinically relevant.

Computer support is instrumental for the analysis of FCM data, because nobody is able to draw conclusions from raw list mode files. However, conventional FCM computer programs execute relative simple tasks to support the workflow of a human researcher or diagnostician. In a typical workflow, several sequential steps have to be performed (Fig. 1, left side). Fluorescence spillover compensation is calculated from control samples. One‐dimensional transformation of raw data (logarithmic, logical, possibly a shift of zero and negative values to some defined minimum, etc.) is routinely performed on fluorescence channels. Data are displayed in histograms or two‐dimensional plots. Starting gates are used to look for artifacts and to remove debris and cells not of interest. A considerable number of plots are necessary, if several fluorochromes are used and several populations are of interest. Data from several samples with identical panel may be displayed in parallel in an overlay. Cells are tagged according to gates in these plots and may then be displayed separately and/or color‐coded. Hierarchical and/or Boolean gating strategies are used for the definition of cell populations and subpopulations of interest. Cell numbers and antigen expression of these cell populations of interest constitute the readout of a single tube. A final result or diagnosis is derived assessing this readout or the synopsis of the readout of several tubes.

**Figure 1**
Open in figure viewerPowerPoint

FCM workflow. A conventional workflow is depicted on the left. Gating steps in multiple plots are performed sequentially and classification results are derived from hierarchical or Boolean analysis. Several cell populations may be defined with one antibody panel. Computer algorithms may perform one or several steps of such a workflow. Algorithms dealing directly with higher dimensional data are depicted on the right. Tasks that are always performed by automated software are depicted in red frames. Tasks that are applied to a whole set of data in parallel are depicted on green background. [Color figure can be viewed at wileyonlinelibrary.com]

All of the calculations in such a manual workflow are based on straight “if A then B” logic, performing calculations on a maximum of two parameters concurrently. Conventional FCM computer support aims at displaying data in a clear manner to the human operator, especially effects of manipulation in two‐parameter plots upon plots of other parameters, but not at automation. The most advanced process in standard applications is the calculation of fluorescence spillover compensation, which nowadays usually is performed in some (semi‐) automated fashion. However, although every single step in this procedure is quite straightforward, due to the multitude of plots and gates from current 10 to 14 parameter FCM data, important information may be missed.

In the recent decades, many attempts have been reported to introduce more advanced computation methods into histology, cytopathology, image cytometry and conventional FCM analysis (1, 2). These algorithms will be called artificial intelligence (AI) from here, although some of them do not deserve this name in its strict sense. Two strategies, sometimes overlapping, are applied in these attempts: firstly, AI may be used to automatize conventional data processing and analysis as described above in order to reduce the workload for the investigator, reduce bias using standardized procedures, and speed up analyses. To this end, regarding FCM, algorithms search for minimal values in distributions to define optimal positions for gates to divide populations or search for appropriate cut off values to gate out debris. Furthermore, normalization algorithms can be applied to level out differences due to instrument settings or biological variations in sets of multiple similar data. Many of these algorithms are available in the Bioconductor “flow Core” FCM package implemented in R (3). Secondly, new methods were introduced that go beyond the sequential analysis of two‐dimensional plots and base calculations on more parameters of the higher‐dimensional space in parallel, which is a crucial need nowadays, when standard cytometers report 10 to 14 parameters per cell and dedicated research instruments up to over 100 parameters. Such algorithms can either substitute conventional strategies, for example, to gate cell populations and read out antigen expression levels or they can be used to extract information from the raw data that is not accessible by conventional gating (4).

One of the prominent tasks within an FCM workflow is to define cell populations within a mixture of different cells (“clustering”) that may be of interest for research or diagnosis. AI can directly use higher dimensional data as input for cell clustering or it can perform dimensionality reduction and data visualization, for example, by tSNE or one of its variants (5, 6) or SOM (7), the latter already including some clustering of the data. After dimensionality reduction, population clustering can be added by separate AI algorithms or a human operator can take over for this task, integrating the output of the dimensionality reduction and conventional gating. Many different algorithms are able to solve the task of clustering in an automated fashion either performing a two‐step procedure integrating dimension reduction and subsequential clustering or direct clustering of higher dimensional data; however, as shown in the FlowCAP challenges, results are not unequivocal, especially, if the number of clusters is not defined a priori, and differences remain between different algorithms and human experts. Up to now, no perfect automatic solution for cell clustering exists, although many solutions perform quite well (7). Furthermore, clustering revealing further information on relatedness between populations has been suggested for a multitude of different research questions, for example, cellular developmental trajectories, and has been optimized according to these special tasks (further Ref. in 4). Furthermore, metadata extracted from raw FCM data may also be clustered, for example, in order to define diagnostic or prognostic subgroups (8).

Whereas unsupervised clustering can be helpful for many exploratory research questions to identify cell populations and subpopulations, for medical diagnostic purposes supervised AI methods have been described, that use external information such as diagnoses or outcome to train the AI, for example, using support vector machines or neural networks. All of these strategies rely on a large dataset for training and may incorporate more or less steps from a conventional workflow (4, 9, 10). Manual gating and tagging of cell populations may be used for training of the AI (11) or AI may be trained using only the final results, that is, diagnosis, as described, for example, in Ref. (12) or in the work by Zhao et al. discussed here. Several AI strategies have been able to discern overt acute myeloid leukemia (AML) from normal samples with a high success rate in the second FlowCap challenge (7), however, this can be a considered a quite simple task, since overt AML is easily characterized by a large abnormal population of blast or sometimes monocytic cells. In contrast, separation of AML from myelodysplastic syndromes or from acute lymphoblastic leukemia, everyday questions in diagnostics, is less trivial.

In contrast to simplified “yes or no” tasks, Zhao et al. tackled a much more realistic question: to deduce a specific diagnosis from FCM panels as they are used in conventional diagnostics. They achieved this goal without an attempt to mimic a conventional human FCM workflow. They transformed the FCM data by self‐organizing maps (SOM) and classified these representations by a convolutional neural network (CNN), dealing with each tube separately first and finally with data from all three tubes. The researchers took advantage of a very large database of patient sample FCM data. Data from more than 18,000 samples analyzed in a uniform fashion with identical antibody combinations and more than 200 samples of the rarest subtype of lymphoma could be used to train the CNN. In order to get some insight into the CNN “black box,” they checked, which markers were of most importance for the AI to classify a specific diagnosis correctly and they had cell populations tagged that were detected to be abnormal and discriminative by the algorithm for the respective disease in a way to understand the AI's decision (and to use this assignment for a possible refinement by a human diagnostician in practical diagnostic use in the future).

As described above, the results of their approach are remarkable, but a problem in sensitivity to detect all true lymphoma cases remains, which is most prominent for follicular lymphoma. Maybe the CNN could be trained in a way, that the correct distinction B‐NHL of any type versus normal is assigned a higher weight compared to B‐NHL subtyping. If we inspect the importance of single markers for AI performance in Supporting Figure 5, we note that some diagnosis assignments rely heavily on a few markers, whereas other diagnoses seem to rather depend on the distribution of many markers. Interestingly, the latter diagnoses without dependence on dominant markers have the highest rate of falsely being categorized as normal (follicular lymphoma, marginal zone lymphoma, lymphoplasmactic lymphoma). Furthermore, for a human diagnostician, an imbalance of kappa versus lambda light chain expression on B cells is a very important clue for a diagnosis of B‐cell lymphoma, whereas the CNN of Zhao et al. does not seem to rely heavily on this information. In a different approach, to detect minimal residual disease in childhood acute leukemia, conventional gating was used to train a machine learning algorithm based on Gaussian mixture models (11). Thus, for the non‐AI expert the idea comes up, if some information of a conventional workflow, collected by an automated application, could be “injected” into a CNN algorithm.

If we assume that the problem of sensitivity will be tackled by improved versions in the near future, the AI solution of Zhao et al. will in fact be able to perform at “hematologist‐level” and may even deliver B‐NHL subtyping competence exceeding the results of conventional FCM alone. However, further problems have to be solved for a broader uptake of such a method: different laboratories work with different antibody panels and even antibodies recognizing the same cluster of differentiation antigen behave differently due to different antibody clones, different fluorochromes and different spillover from other fluorochromes in the panel. Thus, some methods of knowledge transfer are needed, if we want to avoid starting again with a training sample of more than 10,000 cases for every new antibody panel. If researchers will be able to solve these problems, AI for diagnostic FCM may finally leave the “proof of concept” stage and enter routine diagnostics.

中文翻译：

即将黄金时段：流式细胞术诊断中的人工智能。

在本期细胞计数法A中，Zhao等人。（第1073–1080页）报告了他们使用专用计算机方法通过流式细胞术（FCM）血液和骨髓样本的原始数据诊断白血病B细胞非霍奇金淋巴瘤的工作，该方法将分配八种B细胞淋巴瘤诊断之一或对样本“正常”。在验证集中可以实现显着水平的分类性能。对于B细胞淋巴瘤的“真实”分类，常规诊断已将形态学，FCM以及必要时来自组织学和遗传学的其他信息结合在一起，而计算机诊断仅来自FCM数据。在这种情况下，例如要划定不确定性，来自慢性淋巴细胞性白血病的单克隆B细胞淋巴细胞增多或将B细胞恶性肿瘤归类为套细胞淋巴瘤或前淋巴细胞性白血病不是一个彻底的错误，而是基于FCM本身的局限性。此外，被算法标记为异常并在常规样图中进行相应颜色编码的细胞群可以帮助人类诊断人员检查和微调诊断。但是，该算法将某些淋巴瘤（在滤泡性淋巴瘤中最突出）分类为正常。反之亦然，通过该算法，只有极少数被人类诊断医师归类为“正常”的样本被归类为淋巴瘤。因此，存在敏感性不足，这与临床有关。被算法标记为异常并在常规图中相应进行颜色编码的细胞群可以帮助人类诊断人员检查和微调诊断。但是，该算法将某些淋巴瘤（在滤泡性淋巴瘤中最突出）分类为正常。反之亦然，通过该算法，只有极少数被人类诊断医师归类为“正常”的样本被归类为淋巴瘤。因此，存在敏感性不足，这与临床有关。被算法标记为异常并在常规图中相应进行颜色编码的细胞群可以帮助人类诊断人员检查和微调诊断。但是，该算法将某些淋巴瘤（在滤泡性淋巴瘤中最突出）分类为正常。反之亦然，通过该算法，只有极少数被人类诊断医师归类为“正常”的样本被归类为淋巴瘤。因此，存在敏感性不足，这与临床有关。

计算机支持对于FCM数据的分析非常有用，因为没有人能够从原始列表模式文件中得出结论。但是，常规的FCM计算机程序执行相对简单的任务来支持人类研究人员或诊断医生的工作流程。在典型的工作流程中，必须执行几个顺序的步骤（图1，左侧）。荧光溢出补偿由对照样品计算得出。通常在荧光通道上执行原始数据的一维转换（对数，逻辑，可能将零和负值移动到某些定义的最小值等）。数据以直方图或二维图显示。起始门用于查找伪影，并清除不需要的碎片和细胞。需要大量的地块，如果使用了几种荧光染料并且感兴趣的是几种。来自具有相同面板的几个样本的数据可以在叠加中并行显示。根据这些图中的门对单元进行标记，然后可以分别显示和/或对它们进行颜色编码。分层和/或布尔门策略用于定义细胞群和感兴趣的亚群。这些目的细胞群的细胞数目和抗原表达构成了单个试管的读数。得出最终结果或诊断，评估该读数或几个管的读数概要。根据这些图中的门对单元进行标记，然后可以分别显示和/或对它们进行颜色编码。分层和/或布尔门策略用于定义细胞群和感兴趣的亚群。这些目的细胞群的细胞数目和抗原表达构成了单个试管的读数。得出最终结果或诊断，评估该读数或几个管的读数概要。根据这些图中的门对单元进行标记，然后可以分别显示和/或对它们进行颜色编码。分层和/或布尔门策略用于定义细胞群和感兴趣的亚群。这些目的细胞群的细胞数目和抗原表达构成了单个试管的读数。得出最终结果或诊断，评估该读数或几个管的读数概要。

图1
在图形查看器中打开微软幻灯片软件

FCM工作流程。传统的工作流程显示在左侧。依次执行多个图中的选通步骤，并从分层或布尔分析得出分类结果。一个抗体组可以定义几种细胞群。计算机算法可以执行这种工作流程的一个或几个步骤。右侧描述了直接处理高维数据的算法。始终由自动化软件执行的任务以红色框表示。在绿色背景上描绘了并行应用于整套数据的任务。[颜色图可在wileyonlinelibrary.com上查看]

在这种手动工作流程中，所有计算都基于直接的“ if A then B”逻辑，同时最多对两个参数执行计算。常规的FCM计算机支持旨在以清晰的方式向操作员显示数据，尤其是在两参数图上操作对其他参数的影响，而不是自动化。标准应用程序中最先进的过程是荧光溢出补偿的计算，如今通常以某种（半）自动化的方式执行。但是，尽管此过程中的每个步骤都非常简单，但是由于来自当前10到14个参数FCM数据的大量图和门，可能会丢失重要的信息。

在最近的几十年中，已经进行了许多尝试，试图将更高级的计算方法引入组织学，细胞病理学，图像细胞术和常规FCM分析中（1，2）。这些算法从这里开始将被称为人工智能（AI），尽管其中某些算法在严格意义上不应该使用此名称。在这些尝试中应用了两种策略，有时有时是重叠的：首先，如上所述，可以使用AI来自动化常规数据处理和分析，以减少研究人员的工作量，使用标准化程序减少偏差并加快分析速度。为此，对于FCM，算法搜索分布中的最小值以定义门的最佳位置以划分种群，或搜索适当的临界值以清除碎片。此外，可以应用归一化算法来消除由于仪器设置或多个相似数据的集合中的生物学差异引起的差异。3）。其次，引入了新的方法，该方法超越了对二维图的顺序分析，并且基于并行计算高维空间的更多参数，这是当今的一项重要需求，当标准细胞计数仪报告每个细胞10至14个参数时，专用研究仪器，多达100多个参数。这样的算法可以替代常规策略，例如，门控细胞群体并读出抗原表达水平，或者可以用于从原始数据中提取常规门控无法访问的信息（4）。

FCM工作流程中的一项重要任务是在研究或诊断中可能感兴趣的不同细胞混合物（“聚类”）中定义细胞群体。AI可以直接使用更高维度的数据作为细胞聚类的输入，也可以执行降维和数据可视化，例如，通过tSNE或其变体之一（5，6）或SOM（7），后者已经包含了一些数据聚类。降维后，可以通过单独的AI算法添加总体聚类，或者人工操作人员可以接管此任务，将降维和常规选通的输出集成在一起。许多不同的算法都能够以自动方式解决聚类的任务，方法是执行两步过程，整合降维和后续聚类，或者直接对高维数据进行聚类；但是，如FlowCAP挑战所示，结果并非是明确的，尤其是如果没有事先定义聚类的数量，并且不同算法和专家之间仍存在差异。到目前为止，尽管许多解决方案的效果都很好（7）。此外，针对众多不同的研究问题，例如细胞发育轨迹，提出了揭示群体之间相关性的进一步信息的聚类方法，并且根据这些特殊任务对其进行了优化（参见其他参考文献4）。此外，从原始FCM数据中提取的元数据也可以被聚类，例如，以便定义诊断或预后子组（8）。

鉴于无监督聚类可能有助于许多探索性研究问题，以鉴定细胞群体和亚群，但出于医学诊断目的，已经描述了有监督的AI方法，该方法使用诸如诊断或结果之类的外部信息来训练AI，例如使用支持向量机或神经网络。所有这些策略都依赖于大型数据集进行训练，并且可能包含常规工作流程中的更多或更少步骤（4、9、10）。人工门控和细胞群标记可以用于AI（11）的训练，或者AI可以仅使用最终结果（即诊断）来训练AI，例如，如参考文献3中所述。（12）或在Zhao等人的著作中。在这里讨论。一些AI策略已经能够辨别明显的急性髓系白血病（AML）从第二FlowCap挑战（高成功率的正常样本7），但是，这可能是一个考虑的一个非常简单的任务，因为明显的AML容易特点大量异常细胞或单核细胞引起。相反，将AML与骨髓增生异常综合症或急性淋巴细胞白血病（诊断中的日常问题）分开的重要性不高。

与简化的“是或否”任务相反，Zhao等人。解决了一个更为现实的问题：从FCM面板中推断出常规诊断中使用的特定诊断。他们实现了这一目标，而没有试图模仿传统的人类FCM工作流程。他们通过自组织映射（SOM）转换了FCM数据，并通过卷积神经网络（CNN）对这些表示进行了分类，首先分别处理每个管，最后处理来自所有三个管的数据。研究人员利用了庞大的患者样本FCM数据数据库。可以使用来自具有相同抗体组合的统一方式分析的18,000多个样本中的数据以及200多种最罕见的淋巴瘤亚型样本中的数据来训练CNN。为了深入了解CNN“黑匣子”，他们进行了检查，

如上所述，他们的方法的结果是显着的，但是仍然存在检测所有真正淋巴瘤病例的敏感性问题，这对于滤泡性淋巴瘤最为突出。也许可以用某种方式训练CNN，与B-NHL亚型相比，将任何类型的B-NHL与正常类型的正确区别赋予更高的权重。如果我们在支持图5中检查单个标记对于AI性能的重要性，我们会注意到某些诊断任务严重依赖于几个标记，而其他诊断似乎更依赖于许多标记的分布。有趣的是，后者不依赖显性标记物的诊断将错误分类为正常（滤泡性淋巴瘤，边缘区淋巴瘤，淋巴浆细胞性淋巴瘤）的发生率最高。此外，对于人类诊断师，B细胞中Kappa与Lambda轻链表达的失衡是诊断B细胞淋巴瘤的非常重要的线索，而Zhao等人的CNN等。似乎并不严重依赖此信息。通过另一种方法来检测儿童急性白血病的最小残留疾病，传统的门控技术被用来训练基于高斯混合模型的机器学习算法（11）。因此，对于非AI专家而言，如果可以将由自动化应用程序收集的常规工作流的某些信息“注入”到CNN算法中，就会想到这个主意。

如果我们假设灵敏度问题将在不久的将来通过改进的版本解决，那么Zhao等人的AI解决方案就是如此。实际上将能够在“血液学家水平”上发挥作用，甚至可以提供超过常规FCM结果的B-NHL分型能力。但是，为更广泛地使用这种方法，还需要解决更多的问题：不同的实验室使用不同的抗体组，甚至识别相同分化抗原簇的抗体由于不同的抗体克隆，不同的荧光染料和与其他荧光染料不同的溢出而表现不同在面板中。因此，如果我们要避免为每个新的抗体专家组从超过10,000个案例的训练样本中重新开始，则需要一些知识转移的方法。

更新日期：2020-07-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11