当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.15994
Viet Bui The, Oanh Tran Thi, Phuong Le-Hong

This paper describes our study on using mutilingual BERT embeddings and some new neural models for improving sequence tagging tasks for the Vietnamese language. We propose new model architectures and evaluate them extensively on two named entity recognition datasets of VLSP 2016 and VLSP 2018, and on two part-of-speech tagging datasets of VLSP 2010 and VLSP 2013. Our proposed models outperform existing methods and achieve new state-of-the-art results. In particular, we have pushed the accuracy of part-of-speech tagging to 95.40% on the VLSP 2010 corpus, to 96.77% on the VLSP 2013 corpus; and the F1 score of named entity recognition to 94.07% on the VLSP 2016 corpus, to 90.31% on the VLSP 2018 corpus. Our code and pre-trained models viBERT and vELECTRA are released as open source to facilitate adoption and further research.

中文翻译:

使用基于 Transformer 的神经模型改进越南文本的序列标记

本文描述了我们使用多语言 BERT 嵌入和一些新的神经模型来改进越南语序列标记任务的研究。我们提出了新的模型架构,并在 VLSP 2016 和 VLSP 2018 的两个命名实体识别数据集以及 VLSP 2010 和 VLSP 2013 的两个词性标注数据集上对它们进行了广泛的评估。我们提出的模型优于现有方法并实现了新的状态-最先进的结果。特别是,我们在 VLSP 2010 语料库上将词性标注的准确率提高到 95.40%,在 VLSP 2013 语料库上提高到 96.77%;并且命名实体识别的 F1 分数在 VLSP 2016 语料库上达到 94.07%,在 VLSP 2018 语料库上达到 90.31%。我们的代码和预训练模型 viBERT 和 vELECTRA 作为开源发布,以促进采用和进一步研究。
更新日期:2020-09-28
down
wechat
bug