当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus.
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2016-01-11 , DOI: 10.1007/s10579-015-9330-7
Aleksandar Savkov 1 , John Carroll 1 , Rob Koeling 1 , Jackie Cassell 2
Affiliation  

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.

中文翻译:

用句法块和命名实体注释患者的临床记录:Harvey语料库。

医师在患者会诊期间键入的自由文本注释包含有关疾病和治疗研究的宝贵信息。这些注释很难被现有的自然语言分析工具处理,因为它们具有很高的电报性(省略了许多单词),并且包含许多拼写错误,标点符号不一致以及非标准的单词顺序。为了支持针对此类文本的信息提取和分类任务,我们描述了一种自由文本注释的已识别身份语料库,此类文本的浅层语法和命名实体注释方案,以及一种培训没有语言背景的领域专家进行注释的方法文本。最后,我们为此类临床文本提供了一种统计分块系统,具有稳定的学习率和良好的准确性,
更新日期:2016-01-11
down
wechat
bug