Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology

Nobel, J. Martijn; Puts, Sander; Bakers, Frans C. H.; Robben, Simon G. F.; Dekker, André L. A. J.

doi:10.1007/s10278-020-00327-z

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology

Original Paper
Published: 19 February 2020

Volume 33, pages 1002–1008, (2020)
Cite this article

Journal of Digital Imaging Aims and scope Submit manuscript

J. Martijn Nobel ORCID: orcid.org/0000-0003-3379-7290^1,2^na1,
Sander Puts³^na1,
Frans C. H. Bakers¹,
Simon G. F. Robben^1,2 &
…
André L. A. J. Dekker³

821 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

Reports are the standard way of communication between the radiologist and the referring clinician. Efforts are made to improve this communication by, for instance, introducing standardization and structured reporting. Natural Language Processing (NLP) is another promising tool which can improve and enhance the radiological report by processing free text. NLP as such adds structure to the report and exposes the information, which in turn can be used for further analysis. This paper describes pre-processing and processing steps and highlights important challenges to overcome in order to successfully implement a free text mining algorithm using NLP tools and machine learning in a small language area, like Dutch. A rule-based algorithm was constructed to classify T-stage of pulmonary oncology from the original free text radiological report, based on the items tumor size, presence and involvement according to the 8th TNM classification system. PyContextNLP, spaCy and regular expressions were used as tools to extract the correct information and process the free text. Overall accuracy of the algorithm for evaluating T-stage was 0,83 in the training set and 0,87 in the validation set, which shows that the approach in this pilot study is promising. Future research with larger datasets and external validation is needed to be able to introduce more machine learning approaches and perhaps to reduce required input efforts of domain-specific knowledge. However, a hybrid NLP approach will probably achieve the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

Article Open access 10 June 2021

Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: “Including PET-CT and Validation Towards Clinical Use”

Article Open access 12 January 2024

Natural language processing for populating lung cancer clinical research data

Article Open access 05 December 2019

References

McGinty GB, Allen B, Geis JR, Wald C: IT infrastructure in the era of imaging 3.0. J Am Coll Radiol 11:1197–1204, 2014
Article Google Scholar
Brierley J, Gospodarowicz MK, Wittekind C Eds: TNM classification of malignant tumours, 8th edition. Chichester: John Wiley & Sons Inc., 2017
Google Scholar
Puts S, Nobel JM: Medical narrative to structure: maastroclinic/medstruct. maastroclinic, 2019
Krupinski EA, Hall ET, Jaw S, Reiner B, Siegel E: Influence of radiology report format on reading time and comprehension. J Digit Imaging 25:63–69, 2012
Article Google Scholar
Pons E, Braun LMM, Hunink MGM, Kors JA: Natural language processing in radiology: A systematic review. Radiology 279:329–343, 2016
Article Google Scholar
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513, 2010
Article Google Scholar
Cornet R, van Eldik A, de Keizer N: Inventory of tools for Dutch clinical language processing. Stud Health Technol Inform 180:245–249, 2012
PubMed Google Scholar
Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S: Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc 17:440–445, 2010
Article Google Scholar
Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RT: Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform 69:177–187, 2017
Article Google Scholar
Pathak S, van Rossen J, Vijlbrief O, Geerdink J, Seifert C, van Keulen M: Automatic Structuring of Breast Cancer Radiology Reports for Quality Assurance. IEEE international conference on data mining workshops (ICDMW), Singapore, IEEE 2018(732–739):2018, 2018
Google Scholar
Honnibal M, Montani I: Spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear: 7, 2017
Soldaini L, Goharian N: QuickUMLS: a fast, unsupervised approach for medical concept extraction. MedIR workshop, sigir, 2016. Available at http://ir.cs.georgetown.edu/downloads/quickumls.pdf. Accessed 6 May 2019.
Côté RA, Robboy S: Progress in medical information management. Systematized nomenclature of medicine (SNOMED). JAMA 243:756–762, 1980
Article Google Scholar
Chapman BE, Lee S, Kang HP, Chapman WW: Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform 44:728–737, 2011
Article Google Scholar
Chapman WW, Hillert D, Velupillai S, Kvist M, Skeppstedt M, Chapman BE, Conway M, Tharp M, Mowery DL, Deleger L: Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 192:677–681, 2013
PubMed PubMed Central Google Scholar
Afzal Z, Pons E, Kang N, Sturkenboom MC, Schuemie MJ, Kors JA: ContextD: An algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinformatics 15:373, 2014
Article Google Scholar
Chapman WW: Extract context modifiers targeting clinical terms: Maastroclinic/pyConTextNLP 2019. Available at https://github.com/maastroclinic/pyConTextNLP. Accessed 19 June 2019.
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018, 2016
Article Google Scholar

Download references

Author information

J. Martijn Nobel and Sander Puts contributed equally to this work.

Authors and Affiliations

Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202, Maastricht, AZ, Netherlands
J. Martijn Nobel, Frans C. H. Bakers & Simon G. F. Robben
School of Health Professions Education, Maastricht University, Maastricht, Netherlands
J. Martijn Nobel & Simon G. F. Robben
Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Center+, Maastricht, Netherlands
Sander Puts & André L. A. J. Dekker

Authors

J. Martijn Nobel
View author publications
You can also search for this author in PubMed Google Scholar
Sander Puts
View author publications
You can also search for this author in PubMed Google Scholar
Frans C. H. Bakers
View author publications
You can also search for this author in PubMed Google Scholar
Simon G. F. Robben
View author publications
You can also search for this author in PubMed Google Scholar
André L. A. J. Dekker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Martijn Nobel.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(DOCX 25 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nobel, J.M., Puts, S., Bakers, F.C.H. et al. Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology. J Digit Imaging 33, 1002–1008 (2020). https://doi.org/10.1007/s10278-020-00327-z

Download citation

Published: 19 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10278-020-00327-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology

Abstract

Access this article

Similar content being viewed by others

T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: “Including PET-CT and Validation Towards Clinical Use”

Natural language processing for populating lung cancer clinical research data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology

Abstract

Access this article

Similar content being viewed by others

T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: “Including PET-CT and Validation Towards Clinical Use”

Natural language processing for populating lung cancer clinical research data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation