Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study.,The Lancet Oncology

当前位置： X-MOL 学术 › Lancet Oncol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study.
The Lancet Oncology ( IF 51.1 ) Pub Date : 2020-01-08 , DOI: 10.1016/s1470-2045(19)30738-7
Peter Ström,Kimmo Kartasalo,Henrik Olsson,Leslie Solorzano,Brett Delahunt,Daniel M Berney,David G Bostwick,Andrew J Evans,David J Grignon,Peter A Humphrey,Kenneth A Iczkowski,James G Kench,Glen Kristiansen,Theodorus H van der Kwast,Katia R M Leite,Jesse K McKenney,Jon Oxley,Chin-Chen Pan,Hemamali Samaratunga,John R Srigley,Hiroyuki Takahashi,Toyonori Tsuzuki,Murali Varma,Ming Zhou,Johan Lindberg,Cecilia Lindskog,Pekka Ruusuvuori,Carolina Wählby,Henrik Grönberg,Mattias Rantalainen,Lars Egevad,Martin Eklund

BACKGROUND An increasing volume of prostate biopsies and a worldwide shortage of urological pathologists puts a strain on pathology departments. Additionally, the high intra-observer and inter-observer variability in grading can result in overtreatment and undertreatment of prostate cancer. To alleviate these problems, we aimed to develop an artificial intelligence (AI) system with clinically acceptable accuracy for prostate cancer detection, localisation, and Gleason grading. METHODS We digitised 6682 slides from needle core biopsies from 976 randomly selected participants aged 50-69 in the Swedish prospective and population-based STHLM3 diagnostic study done between May 28, 2012, and Dec 30, 2014 (ISRCTN84445406), and another 271 from 93 men from outside the study. The resulting images were used to train deep neural networks for assessment of prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test dataset comprising 1631 biopsies from 246 men from STHLM3 and an external validation dataset of 330 biopsies from 73 men. We also evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics and tumour extent predictions by correlating predicted cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI system and the expert urological pathologists using Cohen's kappa. FINDINGS The AI achieved an area under the receiver operating characteristics curve of 0·997 (95% CI 0·994-0·999) for distinguishing between benign (n=910) and malignant (n=721) biopsy cores on the independent test dataset and 0·986 (0·972-0·996) on the external validation dataset (benign n=108, malignant n=222). The correlation between cancer length predicted by the AI and assigned by the reporting pathologist was 0·96 (95% CI 0·95-0·97) for the independent test dataset and 0·87 (0·84-0·90) for the external validation dataset. For assigning Gleason grades, the AI achieved a mean pairwise kappa of 0·62, which was within the range of the corresponding values for the expert pathologists (0·60-0·73). INTERPRETATION An AI system can be trained to detect and grade cancer in prostate needle biopsy samples at a ranking comparable to that of international experts in prostate pathology. Clinical application could reduce pathology workload by reducing the assessment of benign biopsies and by automating the task of measuring cancer length in positive biopsy cores. An AI system with expert-level grading performance might contribute a second opinion, aid in standardising grading, and provide pathology expertise in parts of the world where it does not exist. FUNDING Swedish Research Council, Swedish Cancer Society, Swedish eScience Research Center, EIT Health.

中文翻译：

用于活检中前列腺癌的诊断和分级的人工智能：基于人群的诊断研究。

背景技术前列腺活检的数量的增加和泌尿病理学家在世界范围内的短缺给病理科带来了压力。另外，分级中观察者之间和观察者之间的高变异性可能导致前列腺癌的过度治疗和治疗不足。为了缓解这些问题，我们旨在开发一种具有临床可接受的准确性的人工智能（AI）系统，用于前列腺癌的检测，定位和格里森分级。方法我们将2012年5月28日至2014年12月30日（ISRCTN84445406）进行的瑞典前瞻性和人群STHLM3诊断研究（ISRCTN84445406）的976名年龄在50-69岁的随机选择的参与者的针芯活检的6682张载玻片数字化，从93个中提取了271张书房外面的男人。所得图像用于训练深度神经网络以评估前列腺活检。对于一个独立的测试数据集，该网络通过预测恶性组织的存在，程度和格里森等级进行了评估，该数据集包括来自STHLM3的246名男性的1631例活检和来自73名男性的330份活检的外部验证数据集。我们还评估了由国际泌尿病理学学会的23位经验丰富的泌尿病理学家对87份活检标本进行分级的性能。我们通过将预测的癌症长度与报告的病理学家的测量值相关联，通过接受者的操作特征和肿瘤程度的预测来评估歧视性表现。我们使用Cohen的kappa量化了AI系统分配的等级与专家泌尿科病理学家之间的一致性。结果在独立测试中，AI在接收器工作特征曲线下获得了一个区域，以区分良性（n = 910）和恶性（n = 721）活检核心，在接收器工作特性曲线下达到0·997（95％CI 0·994-0·999）。数据集和外部验证数据集（良性n = 108，恶性n = 222）上的0·986（0·972-0·996）。由AI预测并由报告病理学家指定的癌症长度之间的相关性对于独立测试数据集为0·96（95％CI 0·95-0·97），对于独立测试数据集为0·87（0·84-0·90）。外部验证数据集。为了分配格里森等级，AI实现了平均成对的kappa为0·62，这在专家病理学家的相应值（0·60-0·73）的范围内。解释可以训练AI系统来检测和分级前列腺穿刺活检样品中的癌症，其等级可与国际前列腺病理学专家相媲美。临床应用可以通过减少对良性活组织检查的评估和自动化在阳性活检核心中测量癌症长度的任务来减少病理学工作量。具有专家级评分性能的AI系统可能会产生其他意见，有助于标准化评分，并在世界上不存在的部分地区提供病理学专业知识。资助瑞典研究委员会，瑞典癌症协会，瑞典电子科学研究中心，EIT Health。临床应用可以通过减少对良性活组织检查的评估和自动化在阳性活检核心中测量癌症长度的任务来减少病理学工作量。具有专家级评分性能的AI系统可能会产生第二种意见，有助于标准化评分，并在世界上不存在的部分地区提供病理学专业知识。资助瑞典研究委员会，瑞典癌症协会，瑞典电子科学研究中心，EIT Health。临床应用可以通过减少对良性活组织检查的评估和自动化在阳性活检核心中测量癌症长度的任务来减少病理学工作量。具有专家级评分性能的AI系统可能会产生其他意见，有助于标准化评分，并在世界上不存在的部分地区提供病理学专业知识。资助瑞典研究委员会，瑞典癌症协会，瑞典电子科学研究中心，EIT Health。

更新日期：2020-01-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>