LATTE: A knowledge-based method to normalize various expressions of laboratory test results in free text of Chinese electronic health records.,Journal of Biomedical informatics

当前位置： X-MOL 学术 › J. Biomed. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LATTE: A knowledge-based method to normalize various expressions of laboratory test results in free text of Chinese electronic health records.
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2019-12-31 , DOI: 10.1016/j.jbi.2019.103372
Kun Jiang ₁ , Tao Yang ₂ , Chunyan Wu ₃ , Luming Chen ₄ , Longfei Mao ₄ , Yongyou Wu ₂ , Lizong Deng ₄ , Taijiao Jiang ₄

Affiliation

BACKGROUND A wealth of clinical information is buried in free text of electronic health records (EHR), and converting clinical information to machine-understandable form is crucial for the secondary use of EHRs. Laboratory test results, as one of the most important types of clinical information, are written in various styles in free text of EHRs. This has brought great difficulties for data integration and utilization of EHRs. Therefore, developing technology to normalize different expressions of laboratory test results in free text is indispensable for the secondary use of EHRs. METHODS In this study, we developed a knowledge-based method named LATTE (transforming lab test results), which could transform various expressions of laboratory test results into a normalized and machine-understandable format. We first identified the analyte of a laboratory test result with a dictionary-based method and then designed a series of rules to detect information associated with the analyte, including its specimen, measured value, unit of measure, conclusive phrase and sampling factor. We determined whether a test result is normal or abnormal by understanding the meaning of conclusive phrases or by comparing its measured value with an appropriate normal range. Finally, we converted various expressions of laboratory test results, either in numeric or textual form, into a normalized form as "specimen-analyte-abnormality". With this method, a laboratory test with the same type of abnormality would have the same representation, regardless of the way that it is mentioned in free text. RESULTS LATTE was developed and optimized on a training set including 8894 laboratory test results from 756 EHRs, and evaluated on a test set including 3740 laboratory test results from 210 EHRs. Compared to experts' annotations, LATTE achieved a precision of 0.936, a recall of 0.897 and an F1 score of 0.916 on the training set, and a precision of 0.892, a recall of 0.843 and an F1 score of 0.867 on the test set. For 223 laboratory tests with at least two different expression forms in the test set, LATTE transformed 85.7% (2870/3350) of laboratory test results into a normalized form. Besides, LATTE achieved F1 scores above 0.8 for EHRs from 18 of 21 different hospital departments, indicating its generalization capabilities in normalizing laboratory test results. CONCLUSION In conclusion, LATTE is an effective method for normalizing various expressions of laboratory test results in free text of EHRs. LATTE will facilitate EHR-based applications such as cohort querying, patient clustering and machine learning. AVAILABILITY LATTE is freely available for download on GitHub (https://github.com/denglizong/LATTE).

中文翻译：

LATTE：一种基于知识的方法，用于规范化中文电子健康记录自由文本中实验室测试结果的各种表达。

背景技术丰富的临床信息隐藏在电子健康记录 (EHR) 的自由文本中，将临床信息转换为机器可理解的形式对于 EHR 的二次使用至关重要。实验室检测结果作为最重要的临床信息类型之一，以各种风格书写在 EHR 的自由文本中。这给电子病历的数据整合和利用带来了很大的困难。因此，开发技术以规范化自由文本中实验室测试结果的不同表达方式对于 EHR 的二次使用是必不可少的。方法在这项研究中，我们开发了一种名为 LATTE（转换实验室测试结果）的基于知识的方法，该方法可以将实验室测试结果的各种表达方式转换为标准化和机器可理解的格式。我们首先使用基于字典的方法识别实验室测试结果的分析物，然后设计一系列规则来检测与分析物相关的信息，包括其样本、测量值、测量单位、结论性短语和采样因子。我们通过理解结论性短语的含义或通过将其测量值与适当的正常范围进行比较来确定测试结果是正常还是异常。最后，我们将实验室测试结果的各种表达形式（无论是数字形式还是文本形式）转换为标准化形式，即“样本-分析物-异常”。使用这种方法，具有相同类型异常的实验室测试将具有相同的表示，而不管自由文本中提到的方式如何。RESULTS LATTE 是在包含来自 756 个 EHR 的 8894 个实验室测试结果的训练集上开发和优化的，并在包含来自 210 个 EHR 的 3740 个实验室测试结果的测试集上进行评估。与专家标注相比，LATTE 在训练集上达到了 0.936 的精度、0.897 的召回率和 0.916 的 F1 分数，在测试集上达到了 0.892 的精度、0.843 的召回率和 0.867 的 F1 分数。对于测试集中至少有两种不同表达形式的 223 个实验室测试，LATTE 将 85.7% (2870/3350) 的实验室测试结果转换为标准化形式。此外，LATTE 在 21 个不同医院部门中的 18 个部门的 EHR 中取得了高于 0.8 的 F1 分数，表明其在标准化实验室测试结果方面的泛化能力。结论总之，LATTE 是标准化 EHR 自由文本中实验室测试结果的各种表达方式的有效方法。LATTE 将促进基于 EHR 的应用程序，例如队列查询、患者聚类和机器学习。可用性 LATTE 可在 GitHub (https://github.com/denglizong/LATTE) 上免费下载。

更新日期：2019-12-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>