Dependency parsing of biomedical text with BERT,BMC Bioinformatics

当前位置： X-MOL 学术 › BMC Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dependency parsing of biomedical text with BERT
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-12-29 , DOI: 10.1186/s12859-020-03905-8
Jenna Kanerva , Filip Ginter , Sampo Pyysalo

Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine. We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing. We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.

中文翻译：

用BERT解析生物医学文本

语法分析或解析是自然语言处理中的关键任务，并且是许多文本挖掘方法的必需组件。近年来，通用依赖项（UD）已经成为依赖项解析的主要形式形式。尽管许多有关UD的近期任务在多语言分析方面已大大提高了技术水平，但对从特殊领域（例如生物医学）解析文本的研究很少。我们使用最新引入的CRAFT-SA共享任务数据集，探索最新的神经依赖性解析方法在生物医学文本中的应用。CRAFT-SA任务大致上遵循UD表示法和最新的UD任务约定，使我们可以微调UD兼容的Turku神经解析器和UDify神经解析器。我们使用广泛的BERT模型（包括为生物医学文本处理专门训练的几种模型）进一步评估转移学习的效果。我们发现，最近引入的神经解析技术能够生成生物医学文本的高精度分析，从而大大改善了原始CRAFT-SA共享任务中报告的最佳性能。我们还发现，使用在域内文本上预先训练的深度转移学习模型进行初始化对于最大化解析方法的性能至关重要。大大改善了原始CRAFT-SA共享任务中报告的最佳性能。我们还发现，使用在域内文本上预先训练的深度转移学习模型进行初始化对于最大化解析方法的性能至关重要。大大改善了原始CRAFT-SA共享任务中报告的最佳性能。我们还发现，使用在域内文本上预先训练的深度转移学习模型进行初始化对于最大化解析方法的性能至关重要。

更新日期：2020-12-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>