Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2020-07-13 , DOI: 10.1016/j.patrec.2020.07.017
Ling Zhao , Ailian Zhang , Ying Liu , Hao Fei

Recent studies show that the joint Chinese word segmentation and POS tagging can enhance the mutual interaction and yield better performances for two tasks. However, existing joint methods fail to effectively take the advantage of the multiple granularity of information, e.g., character, word and subword, which has been proven prominently useful. In this paper, we propose to improve the joint tasks by leveraging such multi-granularity of information, by exploiting the lattice-LSTM and Convolutional Network (GCN) models for effectively encoding the graph information. On five benchmark datasets our proposed model shows highly competitive performances, achieving the new state-of-the-art results in the literature. Further analysis reveals that the multi-granularity information can relieve the out-of-vocabulary and the long-range dependency issues. Also the GCN structure is more effective for encoding the multi-granularity graph information, compared with the lattice structure.

中文翻译：

编码多粒度结构信息以进行联合中文分词和POS标记

最近的研究表明，联合中文分词和POS标签可以增强相互的交互作用，并在两个任务上产生更好的性能。但是，现有的联合方法无法有效利用信息的多重粒度（例如字符，单词和子单词）的优势，这已被证明非常有用。在本文中，我们建议通过利用网格的LSTM和卷积网络（GCN）模型来有效地编码图信息，从而利用信息的这种多粒度来改善联合任务。在五个基准数据集上，我们提出的模型显示出极具竞争力的性能，取得了文献中最新的最新结果。进一步的分析表明，多粒度信息可以缓解词汇不足和远程依赖问题。

更新日期：2020-07-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>