当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Tibetan Language Model That Considers the Relationship Between Suffixes and Functional Words
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2021-02-11 , DOI: 10.1109/lsp.2021.3058896
Kuntharrgyal Khysru , Di Jin , Yuxiao Huang , Hui Feng , Jianwu Dang

The complete semantic representation of a Tibetan sentence is mainly determined by the addition of a specific functional word. The choice of Tibetan functional words is mainly influenced (both explicitly and implicitly) by the sequence of Tibetan suffixes. In this article, we propose an RNN-based Tibetan radical suffix unit (TRSU) to consider this relationship. Specifically, for the Tibetan radical suffix unit-explicit (TRSU-E) method, the fixed suffix in Tibetan is used to determine the virtual functional words. For the Tibetan radical suffix unit-implicit (TRSU-I) method, the decision is assisted by adding a specific suffix. To test the method, we design a standard Tibetan corpus, which consists of different genres. Our experimental results show that the complexity of our method is reduced by up to 22.2% relative to the best baseline. Furthermore, with the hidden semantic information and implicit suffix, TRSU-I outperforms TRSU-E by reducing the perplexity (PPL) by 3%. Moreover, good results are achieved on the English Penn Treebank data set.

中文翻译:

考虑后缀与功能词之间关系的藏语语言模型

藏文句子的完整语义表示主要取决于添加特定的功能词。藏语功能词的选择主要受藏语后缀序列的影响(显性和隐性)。在本文中,我们提出了一个基于RNN的藏族根部后缀单位(TRSU)来考虑这种关系。具体来说,对于藏语根部后缀单位-显式(TRSU-E)方法,使用藏语中的固定后缀来确定虚拟功能词。对于藏语根部后缀单位隐式(TRSU-I)方法,通过添加特定后缀来辅助决策。为了测试该方法,我们设计了一个标准的藏语语料库,该语料库由不同体裁组成。我们的实验结果表明,相对于最佳基准,我们的方法的复杂度降低了22.2%。此外,凭借隐藏的语义信息和隐含的后缀,TRSU-I通过将困惑度(PPL)降低了3%,胜过TRSU-E。此外,在英语Penn树库数据集上也取得了良好的结果。
更新日期:2021-03-12
down
wechat
bug