当前位置: X-MOL 学术Lancet Oncol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study.
The Lancet Oncology ( IF 51.1 ) Pub Date : 2020-01-08 , DOI: 10.1016/s1470-2045(19)30739-9
Wouter Bulten 1 , Hans Pinckaers 1 , Hester van Boven 2 , Robert Vink 3 , Thomas de Bel 1 , Bram van Ginneken 4 , Jeroen van der Laak 1 , Christina Hulsbergen-van de Kaa 3 , Geert Litjens 1
Affiliation  

BACKGROUND The Gleason score is the strongest correlating predictor of recurrence for prostate cancer, but has substantial inter-observer variability, limiting its usefulness for individual patients. Specialised urological pathologists have greater concordance; however, such expertise is not widely available. Prostate cancer diagnostics could thus benefit from robust, reproducible Gleason grading. We aimed to investigate the potential of deep learning to perform automated Gleason grading of prostate biopsies. METHODS In this retrospective study, we developed a deep-learning system to grade prostate biopsies following the Gleason grading standard. The system was developed using randomly selected biopsies, sampled by the biopsy Gleason score, from patients at the Radboud University Medical Center (pathology report dated between Jan 1, 2012, and Dec 31, 2017). A semi-automatic labelling technique was used to circumvent the need for manual annotations by pathologists, using pathologists' reports as the reference standard during training. The system was developed to delineate individual glands, assign Gleason growth patterns, and determine the biopsy-level grade. For validation of the method, a consensus reference standard was set by three expert urological pathologists on an independent test set of 550 biopsies. Of these 550, 100 were used in an observer experiment, in which the system, 13 pathologists, and two pathologists in training were compared with respect to the reference standard. The system was also compared to an external test dataset of 886 cores, which contained 245 cores from a different centre that were independently graded by two pathologists. FINDINGS We collected 5759 biopsies from 1243 patients. The developed system achieved a high agreement with the reference standard (quadratic Cohen's kappa 0·918, 95% CI 0·891-0·941) and scored highly at clinical decision thresholds: benign versus malignant (area under the curve 0·990, 95% CI 0·982-0·996), grade group of 2 or more (0·978, 0·966-0·988), and grade group of 3 or more (0·974, 0·962-0·984). In an observer experiment, the deep-learning system scored higher (kappa 0·854) than the panel (median kappa 0·819), outperforming 10 of 15 pathologist observers. On the external test dataset, the system obtained a high agreement with the reference standard set independently by two pathologists (quadratic Cohen's kappa 0·723 and 0·707) and within inter-observer variability (kappa 0·71). INTERPRETATION Our automated deep-learning system achieved a performance similar to pathologists for Gleason grading and could potentially contribute to prostate cancer diagnosis. The system could potentially assist pathologists by screening biopsies, providing second opinions on grade group, and presenting quantitative measurements of volume percentages. FUNDING Dutch Cancer Society.

中文翻译:

使用活组织检查对前列腺癌的Gleason分级进行自动深度学习的系统:一项诊断性研究。

背景技术格里森评分是前列腺癌复发的最强相关预测因子,但是观察者间差异很大,限制了其对个别患者的有用性。专业的泌尿科病理学家具有更大的一致性;但是,这种专业知识并不广泛。因此,前列腺癌诊断可受益于可靠的,可重复的格里森分级。我们旨在研究深度学习进行前列腺活检的自动Gleason评分的潜力。方法在这项回顾性研究中,我们开发了深度学习系统,可按照Gleason评分标准对前列腺活检进行评分。该系统是使用随机选择的活检组织开发的,该活检组织是由Radboud大学医学中心的患者通过活检Gleason评分取样的(病理报告日期为2012年1月1日,和2017年12月31日)。在培训过程中,使用病理学家的报告作为参考标准,使用了一种半自动标记技术来避免病理学家对人工注释的需求。开发该系统以描绘单个腺体,指定格里森生长模式并确定活检级别。为了验证该方法,由三位泌尿外科病理学家在一个独立的550例活检试剂盒上设定了一个共识参考标准。在这550个中,有100个用于观察员实验,其中将系统,13位病理学家和两名接受培训的病理学家相对于参考标准进行了比较。还将该系统与886个核心的外部测试数据集进行了比较,该数据集包含来自不同中心的245个核心,并由两名病理学家独立地对其进行了分级。结果我们从1243名患者中收集了5759份活检样本。开发的系统与参考标准(二次Cohenκ0·918,95%CI 0·891-0·941)高度吻合,并在临床决策阈值上得分很高:良性与恶性(曲线下面积0·990, 95%CI 0·982-0·996),等级组2或更高(0·978、0·966-0·988)和等级组3或更高(0·974、0·962-0· 984)。在观察者实验中,深度学习系统的得分(kappa 0·854)比专家组(中位数kappa 0·819)高,超过了15位病理学家观察者中的10位。在外部测试数据集上,系统获得了与两位病理学家独立设置的参考标准(二次Cohen的kappa 0·723和0·707)的高度一致,并且在观察者之间具有差异(kappa 0·71)。解释我们的自动深度学习系统对格里森(Gleason)分级具有与病理学家相似的性能,并可能有助于前列腺癌的诊断。该系统可能通过筛查活组织检查,对年级组提供第二意见并提出体积百分比的定量测量结果来潜在地帮助病理学家。资助荷兰癌症协会。
更新日期:2020-01-31
down
wechat
bug