当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Hierarchical Sequence-to-Sequence Model for Korean POS Tagging
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2021-04-23 , DOI: 10.1145/3421762
Guozhe Jin 1 , Zhezhou Yu 1
Affiliation  

Part-of-speech (POS) tagging is a fundamental task in natural language processing. Korean POS tagging consists of two subtasks: morphological analysis and POS tagging. In recent years, scholars have tended to use the seq2seq model to solve this problem. The full context of a sentence is considered in these seq2seq-based Korean POS tagging methods. However, Korean morphological analysis relies more on local contextual information, and in many cases, there exists one-to-one matching between morpheme surface form and base form. To make better use of these characteristics, we propose a hierarchical seq2seq model. In our model, the low-level Bi-LSTM encodes the syllable sequence, whereas the high-level Bi-LSTM models the context information of the whole sentence, and the decoder generates the morpheme base form syllables as well as the POS tags. To improve the accuracy of the morpheme base form recovery, we introduced the convolution layer and the attention mechanism to our model. The experimental results on the Sejong corpus show that our model outperforms strong baseline systems in both morpheme-level F1-score and eojeol-level accuracy, achieving state-of-the-art performance.

中文翻译:

韩语词性标注的分级序列到序列模型

词性 (POS) 标记是自然语言处理中的一项基本任务。韩语词性标注包括两个子任务:形态分析和词性标注。近年来,学者们倾向于使用 seq2seq 模型来解决这个问题。在这些基于 seq2seq 的韩语词性标注方法中考虑了句子的完整上下文。然而,韩语词素分析更多地依赖于本地上下文信息,在很多情况下,词素表面形式和基本形式之间存在一对一的匹配。为了更好地利用这些特征,我们提出了分层 seq2seq 模型。在我们的模型中,低级 Bi-LSTM 对音节序列进行编码,而高级 Bi-LSTM 对整个句子的上下文信息进行建模,解码器生成语素基本形式的音节以及词性标签。为了提高词素基形式恢复的准确性,我们在模型中引入了卷积层和注意力机制。在世宗语料库上的实验结果表明,我们的模型在词素级 F1 分数和 eojeol 级准确性方面都优于强大的基线系统,实现了最先进的性能。
更新日期:2021-04-23
down
wechat
bug