当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A frame semantic overview of NLP-based information extraction for cancer-related EHR notes.
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2019-10-04 , DOI: 10.1016/j.jbi.2019.103301
Surabhi Datta 1 , Elmer V Bernstam 2 , Kirk Roberts 1
Affiliation  

Objective

There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer.

Methods

We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps.

Results

Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis.

Conclusion

The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.



中文翻译:

基于NLP的癌症相关EHR注释信息提取的框架语义概述。

客观的

如果可以使用自然语言处理(NLP)方法来提取和构建此信息,则电子健康记录(EHR)注释中有许多有关癌症的信息可能对生物医学研究有用。在本文中,我们对现有的临床NLP癌症文献进行了范围界定的综述。

方法

我们确定了描述NLP方法的研究,这些方法可从PubMed,Google Scholar,ACL Anthology和现有评论的EHR来源中提取特定的癌症相关信息。在这项研究中使用了两个排除标准。我们排除了使用的提取技术过于广泛而无法以框架表示的文章(例如,文档分类)以及使用了非常低级的提取方法(例如,仅识别临床概念)的文章。最终审稿中包含78篇文章。我们根据框架语义原则组织了此信息,以帮助识别重叠和潜在差距的常见区域。

结果

从与癌症信息有关的评论文章创建了框架,这些信息包括癌症诊断,肿瘤描述,癌症程序,乳腺癌诊断,前列腺癌诊断和前列腺癌患者的疼痛。这些框架包括定义和特定的框架元素(即可提取属性)。我们发现,在已审阅的论文中,癌症诊断是最常见的框架(共78篇论文中的36篇),最近的工作集中在提取与治疗和乳腺癌诊断有关的信息。

结论

本文中描述的常见框架列表可识别通过现有NLP技术提取的重要的癌症相关信息,并为将来的研究人员提供有用的资源,这些研究人员需要从EHR注释中提取癌症信息。我们还认为,由于癌症NLP系统的大量重复,带注释的癌症框架和相应的NLP工具的通用资源将很有价值。

更新日期:2019-10-04
down
wechat
bug