Identification of areas of grading difficulties in prostate cancer and comparison with artificial intelligence assisted grading.,Virchows Archiv

当前位置： X-MOL 学术 › Virchows Arch. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Identification of areas of grading difficulties in prostate cancer and comparison with artificial intelligence assisted grading.
Virchows Archiv ( IF 3.4 ) Pub Date : 2020-06-15 , DOI: 10.1007/s00428-020-02858-w
Lars Egevad , Daniela Swanberg , Brett Delahunt , Peter Ström , Kimmo Kartasalo , Henrik Olsson , Dan M. Berney , David G. Bostwick , Andrew J. Evans , Peter A. Humphrey , Kenneth A. Iczkowski , James G. Kench , Glen Kristiansen , Katia R. M. Leite , Jesse K. McKenney , Jon Oxley , Chin-Chen Pan , Hemamali Samaratunga , John R. Srigley , Hiroyuki Takahashi , Toyonori Tsuzuki , Theo van der Kwast , Murali Varma , Ming Zhou , Mark Clements , Martin Eklund

The International Society of Urological Pathology (ISUP) hosts a reference image database supervised by experts with the purpose of establishing an international standard in prostate cancer grading. Here, we aimed to identify areas of grading difficulties and compare the results with those obtained from an artificial intelligence system trained in grading. In a series of 87 needle biopsies of cancers selected to include problematic cases, experts failed to reach a 2/3 consensus in 41.4% (36/87). Among consensus and non-consensus cases, the weighted kappa was 0.77 (range 0.68–0.84) and 0.50 (range 0.40–0.57), respectively. Among the non-consensus cases, four main causes of disagreement were identified: the distinction between Gleason score 3 + 3 with tangential cutting artifacts vs. Gleason score 3 + 4 with poorly formed or fused glands (13 cases), Gleason score 3 + 4 vs. 4 + 3 (7 cases), Gleason score 4 + 3 vs. 4 + 4 (8 cases) and the identification of a small component of Gleason pattern 5 (6 cases). The AI system obtained a weighted kappa value of 0.53 among the non-consensus cases, placing it as the observer with the sixth best reproducibility out of a total of 24. AI may serve as a decision support and decrease inter-observer variability by its ability to make consistent decisions. The grading of these cancer patterns that best predicts outcome and guides treatment warrants further clinical and genetic studies. Results of such investigations should be used to improve calibration of AI systems.

中文翻译：

识别前列腺癌分级困难的领域并与人工智能辅助分级进行比较。

国际泌尿病理学会 (ISUP) 拥有一个由专家监督的参考图像数据库，旨在建立前列腺癌分级的国际标准。在这里，我们旨在确定评分困难的领域，并将结果与从经过评分训练的人工智能系统获得的结果进行比较。在选择包含问题病例的 87 例癌症穿刺活检中，专家未能达成 2/3 的共识，占 41.4% (36/87)。在共识和非共识案例中，加权 kappa 分别为 0.77（范围 0.68-0.84）和 0.50（范围 0.40-0.57）。在非共识案例中，确定了四个主要的分歧原因：格里森评分 3 + 3 与切向切割伪影与格里森评分 3 + 4 与腺体成形不良或融合的区别（13 例），Gleason 评分 3 + 4 vs. 4 + 3（7 例），Gleason 评分 4 + 3 vs. 4 + 4（8 例）和 Gleason 模式 5（6 例）的小成分的鉴定。AI 系统在非共识案例中获得了 0.53 的加权 kappa 值，将其作为观察者，在总共 24 个再现性最佳的情况下排名第六。 AI 可以作为决策支持并通过其能力减少观察者间的可变性做出一致的决定。这些最能预测结果和指导治疗的癌症模式的分级需要进一步的临床和遗传研究。此类调查的结果应用于改进人工智能系统的校准。AI 系统在非共识案例中获得了 0.53 的加权 kappa 值，将其作为观察者，在总共 24 个再现性最佳的情况下排名第六。 AI 可以作为决策支持并通过其能力减少观察者间的可变性做出一致的决定。这些最能预测结果和指导治疗的癌症模式的分级需要进一步的临床和遗传研究。此类调查的结果应用于改进人工智能系统的校准。AI 系统在非共识案例中获得了 0.53 的加权 kappa 值，将其作为观察者，在总共 24 个再现性最佳的情况下排名第六。 AI 可以作为决策支持并通过其能力减少观察者间的可变性做出一致的决定。这些最能预测结果和指导治疗的癌症模式的分级需要进一步的临床和遗传研究。此类调查的结果应用于改进人工智能系统的校准。这些最能预测结果和指导治疗的癌症模式的分级需要进一步的临床和遗传研究。此类调查的结果应用于改进人工智能系统的校准。这些最能预测结果和指导治疗的癌症模式的分级需要进一步的临床和遗传研究。此类调查的结果应用于改进人工智能系统的校准。

更新日期：2020-06-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11