当前位置: X-MOL 学术Radiology › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period
Radiology ( IF 19.7 ) Pub Date : 2021-08-03 , DOI: 10.1148/radiol.2021210043
Richard K G Do 1 , Kaelan Lupton 1 , Pamela I Causa Andrieu 1 , Anisha Luthra 1 , Michio Taya 1 , Karen Batch 1 , Huy Nguyen 1 , Prachi Rahurkar 1 , Lior Gazit 1 , Kevin Nicholas 1 , Christopher J Fong 1 , Natalie Gangai 1 , Nikolaus Schultz 1 , Farhana Zulkernine 1 , Varadan Sevilimedu 1 , Krishna Juluru 1 , Amber Simpson 1 , Hedvig Hricak 1
Affiliation  

Background

Patterns of metastasis in cancer are increasingly relevant to prognostication and treatment planning but have historically been documented by means of autopsy series.

Purpose

To show the feasibility of using natural language processing (NLP) to gather accurate data from radiology reports for assessing spatial and temporal patterns of metastatic spread in a large patient cohort.

Materials and Methods

In this retrospective longitudinal study, consecutive patients who underwent CT from July 2009 to April 2019 and whose CT reports followed a departmental structured template were included. Three radiologists manually curated a sample of 2219 reports for the presence or absence of metastases across 13 organs; these manually curated reports were used to develop three NLP models with an 80%-20% split for training and test sets. A separate random sample of 448 manually curated reports was used for validation. Model performance was measured by accuracy, precision, and recall for each organ. The best-performing NLP model was used to generate a final database of metastatic disease across all patients. For each cancer type, statistical descriptive reports were provided by analyzing the frequencies of metastatic disease at the report and patient levels.

Results

In 91 665 patients (mean age ± standard deviation, 61 years ± 15; 46 939 women), 387 359 reports were labeled. The best-performing NLP model achieved accuracies from 90% to 99% across all organs. Metastases were most frequently reported in abdominopelvic (23.6% of all reports) and thoracic (17.6%) nodes, followed by lungs (14.7%), liver (13.7%), and bones (9.9%). Metastatic disease tropism is distinct among common cancers, with the most common first site being bones in prostate and breast cancers and liver among pancreatic and colorectal cancers.

Conclusion

Natural language processing may be applied to cancer patients’ CT reports to generate a large database of metastatic phenotypes. Such a database could be combined with genomic studies and used to explore prognostic imaging phenotypes with relevance to treatment planning.

© RSNA, 2021

Online supplemental material is available for this article.



中文翻译:

癌症患者的转移性疾病模式源自 10 年期间结构化 CT 放射报告的自然语言处理

背景

癌症中的转移模式与预后和治疗计划越来越相关,但历史上已通过尸检系列记录下来。

目的

展示使用自然语言处理 (NLP) 从放射学报告中收集准确数据以评估大型患者队列中转移扩散的空间和时间模式的可行性。

材料和方法

在这项回顾性纵向研究中,纳入了 2009 年 7 月至 2019 年 4 月期间接受 CT 且 CT 报告遵循部门结构化模板的连续患者。三名放射科医生手动整理了一份包含 2219 份报告的样本,以了解 13 个器官是否存在转移;这些手动策划的报告用于开发三个 NLP 模型,其中 80%-20% 用于训练和测试集。使用 448 份手动策划报告的单独随机样本进行验证。模型性能通过每个器官的准确度、精确度和召回率来衡量。性能最佳的 NLP 模型用于生成所有患者的转移性疾病的最终数据库。对于每种癌症类型,通过分析报告和患者级别的转移性疾病的频率来提供统计描述性报告。

结果

在 91 665 名患者(平均年龄 ± 标准差,61 岁 ± 15;46 939 名女性)中,387 359 份报告被标记。表现最好的 NLP 模型在所有器官中实现了 90% 到 99% 的准确率。腹盆腔(占所有报告的 23.6%)和胸(17.6%)淋巴结转移最常见,其次是肺(14.7%)、肝(13.7%)和骨(9.9%)。在常见癌症中,转移性疾病的趋向性是不同的,最常见的第一个部位是前列腺癌和乳腺癌中的骨骼,以及胰腺癌和结肠直肠癌中的肝脏。

结论

自然语言处理可应用于癌症患者的 CT 报告,以生成大型转移表型数据库。这样的数据库可以与基因组研究相结合,用于探索与治疗计划相关的预后成像表型。

© 北美放射学会,2021

本文提供在线补充材料。

更新日期:2021-09-21
down
wechat
bug