当前位置: X-MOL 学术BMC Med. Inform. Decis. Mak. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Family member information extraction via neural sequence labeling models with different tag schemes.
BMC Medical Informatics and Decision Making ( IF 3.5 ) Pub Date : 2019-12-27 , DOI: 10.1186/s12911-019-0996-4
Hong-Jie Dai

BACKGROUND Family history information (FHI) described in unstructured electronic health records (EHRs) is a valuable information source for patient care and scientific researches. Since FHI is usually described in the format of free text, the entire process of FHI extraction consists of various steps including section segmentation, family member and clinical observation extraction, and relation discovery between the extracted members and their observations. The extraction step involves the recognition of FHI concepts along with their properties such as the family side attribute of the family member concept. METHODS This study focuses on the extraction step and formulates it as a sequence labeling problem. We employed a neural sequence labeling model along with different tag schemes to distinguish family members and their observations. Corresponding to different tag schemes, the identified entities were aggregated and processed by different algorithms to determine the required properties. RESULTS We studied the effectiveness of encoding required properties in the tag schemes by evaluating their performance on the dataset released by the BioCreative/OHNLP challenge 2018. It was observed that the proposed side scheme along with the developed features and neural network architecture can achieve an overall F1-score of 0.849 on the test set, which ranked second in the FHI entity recognition subtask. CONCLUSIONS By comparing with the performance of conditional random fields models, the developed neural network-based models performed significantly better. However, our error analysis revealed two challenging issues of the current approach. One is that some properties required cross-sentence inferences. The other is that the current model is not able to distinguish between the narratives describing the family members of the patient and those specifying the relatives of the patient's family members.

中文翻译:

通过具有不同标签方案的神经序列标签模型提取家庭成员信息。

背景技术非结构化电子病历(EHR)中描述的家族史信息(FHI)是用于患者护理和科学研究的宝贵信息源。由于FHI通常以自由文本格式进行描述,因此FHI提取的整个过程包括各个步骤,包括节段分割,家庭成员和临床观察值提取,以及提取的成员与其观察值之间的关系发现。提取步骤涉及对FHI概念及其属性(例如,家庭成员概念的家族属性)的识别。方法本研究着重于提取步骤,并将其表述为序列标记问题。我们采用了神经序列标记模型以及不同的标记方案来区分家庭成员及其观察结果。对应于不同的标记方案,对所标识的实体进行汇总并通过不同的算法对其进行处理,以确定所需的属性。结果我们通过评估标签方案中所需属性在BioCreative / OHNLP Challenge 2018上发布的数据集的性能来研究其有效性。观察到,提议的辅助方案以及已开发的功能和神经网络体系结构可以总体上实现测试集上的F1分数为0.849,在FHI实体识别子任务中排名第二。结论通过与条件随机场模型的性能进行比较,开发的基于神经网络的模型的性能明显更好。但是,我们的错误分析显示了当前方法的两个具有挑战性的问题。一是某些属性需要交叉句子推论。另一个是当前的模型无法区分描述患者家庭成员的叙述和指定患者家庭成员的亲属的叙述。
更新日期:2019-12-27
down
wechat
bug