当前位置: X-MOL 学术BMC Med. Inform. Decis. Mak. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches.
BMC Medical Informatics and Decision Making ( IF 3.5 ) Pub Date : 2019-12-23 , DOI: 10.1186/s12911-019-0981-y
Rebecka Weegar 1 , Alicia Pérez 2 , Arantza Casillas 2 , Maite Oronoz 2
Affiliation  

BACKGROUND Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools and data sets. METHODS A bi-directional Long Short-Term Memory network is applied to clinical notes in Spanish and Swedish for the task of medical named entity recognition. Several types of embeddings, both generated from in-domain and out-of-domain text corpora, and a number of generation and combination strategies for embeddings have been evaluated in order to investigate different input representations and the influence of domain on the final results. RESULTS For Spanish, a micro averaged F1-score of 75.25 was obtained and for Swedish, the corresponding score was 76.04. The best results for both languages were achieved using embeddings generated from in-domain corpora extracted from electronic health records, but embeddings generated from related domains were also found to be beneficial. CONCLUSIONS A recurrent neural network with in-domain embeddings improved the medical named entity recognition compared to shallow learning methods, showing this combination to be suitable for entity recognition in clinical text for both languages.

中文翻译:

瑞典和西班牙医学实体使用深度神经方法在临床文献中的识别方面的最新进展。

背景技术临床文本的文本挖掘和自然语言处理,例如来自电子健康记录的注释,需要具体考虑这些文本的特殊特征。深度学习方法可能会缓解特定领域的挑战,例如对域内工具和数据集的有限访问。方法将双向长期短期记忆网络应用于西班牙语和瑞典语的临床笔记,以实现医学命名实体识别的任务。为了研究不同的输入表示形式以及域对最终结果的影响,已经评估了几种类型的嵌入,它们都是从域内和域外文本语料库生成的,以及嵌入的多种生成和组合策略。结果对于西班牙语,F1分数平均为75。获得25分,而瑞典人则为76.04分。使用从电子健康记录中提取的域内语料库生成的嵌入,可以达到两种语言的最佳结果,但也发现从相关域生成的嵌入是有益的。结论与浅层学习方法相比,带有域内嵌入的循环神经网络改善了医学命名的实体识别,表明这种组合适用于两种语言的临床文本中的实体识别。
更新日期:2019-12-23
down
wechat
bug