Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.,Journal of Digital Imaging

当前位置： X-MOL 学术 › J. Digit. Imaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.
Journal of Digital Imaging ( IF 2.9 ) Pub Date : 2020-02-19 , DOI: 10.1007/s10278-020-00327-z
J Martijn Nobel _{1,

2} , Sander Puts ₃ , Frans C H Bakers ₁ , Simon G F Robben _{1,

2} , André L A J Dekker ₃

Affiliation

Reports are the standard way of communication between the radiologist and the referring clinician. Efforts are made to improve this communication by, for instance, introducing standardization and structured reporting. Natural Language Processing (NLP) is another promising tool which can improve and enhance the radiological report by processing free text. NLP as such adds structure to the report and exposes the information, which in turn can be used for further analysis. This paper describes pre-processing and processing steps and highlights important challenges to overcome in order to successfully implement a free text mining algorithm using NLP tools and machine learning in a small language area, like Dutch. A rule-based algorithm was constructed to classify T-stage of pulmonary oncology from the original free text radiological report, based on the items tumor size, presence and involvement according to the 8th TNM classification system. PyContextNLP, spaCy and regular expressions were used as tools to extract the correct information and process the free text. Overall accuracy of the algorithm for evaluating T-stage was 0,83 in the training set and 0,87 in the validation set, which shows that the approach in this pilot study is promising. Future research with larger datasets and external validation is needed to be able to introduce more machine learning approaches and perhaps to reduce required input efforts of domain-specific knowledge. However, a hybrid NLP approach will probably achieve the best results.

中文翻译：

荷兰自由文本放射学报告中的自然语言处理：肺肿瘤分期小语言领域的挑战。

报告是放射科医生和转诊临床医生之间的标准沟通方式。例如，通过引入标准化和结构化报告，努力改善这种沟通。自然语言处理 (NLP) 是另一种很有前途的工具，它可以通过处理自由文本来改进和增强放射学报告。这样的 NLP 为报告添加了结构并公开信息，这些信息又可用于进一步分析。本文描述了预处理和处理步骤，并强调了要克服的重要挑战，以便在荷兰语等小语言领域使用 NLP 工具和机器学习成功实现自由文本挖掘算法。构建了一种基于规则的算法，从原始的自由文本放射学报告中对肺肿瘤的 T 分期进行分类，根据第 8 个 TNM 分类系统的肿瘤大小、存在和受累项目。PyContextNLP、spaCy 和正则表达式被用作提取正确信息和处理自由文本的工具。用于评估 T 阶段的算法的总体准确度在训练集中为 0.83，在验证集中为 0.87，这表明该试点研究中的方法很有前景。未来需要使用更大的数据集和外部验证进行研究，以便能够引入更多的机器学习方法，并可能减少特定领域知识所需的输入工作。但是，混合 NLP 方法可能会获得最佳结果。spaCy 和正则表达式被用作提取正确信息和处理自由文本的工具。用于评估 T 阶段的算法的总体准确度在训练集中为 0.83，在验证集中为 0.87，这表明该试点研究中的方法很有前景。未来需要使用更大的数据集和外部验证进行研究，以便能够引入更多的机器学习方法，并可能减少特定领域知识所需的输入工作。但是，混合 NLP 方法可能会获得最佳结果。spaCy 和正则表达式被用作提取正确信息和处理自由文本的工具。用于评估 T 阶段的算法的总体准确度在训练集中为 0.83，在验证集中为 0.87，这表明该试点研究中的方法很有前景。未来需要使用更大的数据集和外部验证进行研究，以便能够引入更多的机器学习方法，并可能减少特定领域知识所需的输入工作。但是，混合 NLP 方法可能会获得最佳结果。未来需要使用更大的数据集和外部验证进行研究，以便能够引入更多的机器学习方法，并可能减少特定领域知识所需的输入工作。但是，混合 NLP 方法可能会获得最佳结果。未来需要使用更大的数据集和外部验证进行研究，以便能够引入更多的机器学习方法，并可能减少特定领域知识所需的输入工作。但是，混合 NLP 方法可能会获得最佳结果。

更新日期：2020-03-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11