当前位置: X-MOL 学术Artif. Intell. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network
Artificial Intelligence in Medicine ( IF 7.5 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.artmed.2020.101977
Hanyin Wang 1 , Yikuan Li 1 , Seema A Khan 2 , Yuan Luo 1
Affiliation  

Distant recurrence of breast cancer results in high lifetime risks and low 5-year survival rates. Early prediction of distant recurrent breast cancer could facilitate intervention and improve patients’ life quality. In this study, we designed an EHR-based predictive model to estimate the distant recurrent probability of breast cancer patients. We studied the pathology reports and progress notes of 6,447 patients who were diagnosed with breast cancer at Northwestern Memorial Hospital between 2001 and 2015. Clinical notes were mapped to Concept unified identifiers (CUI) using natural language processing tools. Bag-of-words and pre-trained embedding were employed to vectorize words and CUI sequences. These features integrated with clinical features from structured data were downstreamed to conventional machine learning classifiers and Knowledge-guided Convolutional Neural Network (K-CNN). The best configuration of our model yielded an AUC of 0.888 and an F1-score of 0.5. Our work provides an automated method to predict breast cancer distant recurrence using natural language processing and deep learning approaches. We expect that through advanced feature engineering, better predictive performance could be achieved.



中文翻译:

使用自然语言处理和知识引导的卷积神经网络预测乳腺癌远处复发

乳腺癌远处复发会导致终生风险高、5 年生存率低。早期预测远处复发乳腺癌可以促进干预并提高患者的生活质量。在这项研究中,我们设计了一个基于 EHR 的预测模型来估计乳腺癌患者的远处复发概率。我们研究了 2001 年至 2015 年间在西北纪念医院诊断为乳腺癌的 6,447 名患者的病理报告和进展记录。使用自然语言处理工具将临床记录映射到概念统一标识符 (CUI)。采用词袋和预训练嵌入来向量化单词和 CUI 序列。这些特征与结构化数据的临床特征相结合,被下游传输到传统的机器学习分类器和知识引导的卷积神经网络(K-CNN)。我们模型的最佳配置产生了 0.888 的 AUC 和 0.5 的 F1 分数。我们的工作提供了一种使用自然语言处理和深度学习方法预测乳腺癌远处复发的自动化方法。我们期望通过先进的特征工程,可以实现更好的预测性能。

更新日期:2020-11-06
down
wechat
bug