Fine-tuning coreference resolution for different styles of clinical narratives,Journal of Biomedical informatics

当前位置： X-MOL 学术 › J. Biomed. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fine-tuning coreference resolution for different styles of clinical narratives
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2023-12-18 , DOI: 10.1016/j.jbi.2023.104578
Yuxiang Liao , Hantao Liu , Irena Spasić

Objective

Coreference resolution (CR) is a natural language processing (NLP) task that is concerned with finding all expressions within a single document that refer to the same entity. This makes it crucial in supporting downstream NLP tasks such as summarization, question answering and information extraction. Despite great progress in CR, our experiments have highlighted a substandard performance of the existing open-source CR tools in the clinical domain. We set out to explore some practical solutions to fine-tune their performance on clinical data.

Methods

We first explored the possibility of automatically producing silver standards following the success of such an approach in other clinical NLP tasks. We designed an ensemble approach that leverages multiple models to automatically annotate co-referring mentions. Subsequently, we looked into other ways of incorporating human feedback to improve the performance of an existing neural network approach. We proposed a semi-automatic annotation process to facilitate the manual annotation process. We also compared the effectiveness of active learning relative to random sampling in an effort to further reduce the cost of manual annotation.

Results

Our experiments demonstrated that the silver standard approach was ineffective in fine-tuning the CR models. Our results indicated that active learning should also be applied with caution. The semi-automatic annotation approach combined with continued training was found to be well suited for the rapid transfer of CR models under low-resource conditions. The ensemble approach demonstrated a potential to further improve accuracy by leveraging multiple fine-tuned models.

Conclusion

Overall, we have effectively transferred a general CR model to a clinical domain. Our findings based on extensive experimentation have been summarized into practical suggestions for rapid transferring of CR models across different styles of clinical narratives.

中文翻译：

针对不同风格的临床叙述微调共指解析

客观的

共指解析 (CR) 是一项自然语言处理 (NLP) 任务，涉及查找单个文档中引用同一实体的所有表达式。这使得它对于支持下游 NLP 任务（例如摘要、问答和信息提取）至关重要。尽管 CR 取得了巨大进展，但我们的实验凸显了临床领域现有开源 CR 工具的性能不合格。我们着手探索一些实用的解决方案来微调它们在临床数据上的表现。

方法

继这种方法在其他临床 NLP 任务中取得成功后，我们首先探索了自动生成银标准的可能性。我们设计了一种集成方法，利用多个模型来自动注释共同引用的提及。随后，我们研究了结合人类反馈的其他方法，以提高现有神经网络方法的性能。我们提出了一种半自动注释过程来促进手动注释过程。我们还比较了主动学习相对于随机抽样的有效性，以进一步降低手动注释的成本。