当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Word-vector Quantization
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2020-06-18 , DOI: 10.1145/3397967
Umut Orhan 1 , Enıs Arslan 1
Affiliation  

We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation in two steps: First, it defines all solution candidates of an ambiguous word using a morphological analyzer; second, it chooses the best candidate according to its total distances to neighbor words that are not ambiguous. To show LWQ's performance, we have conducted many tests on the corpus by considering the consistency of classification. In the experiments, we achieve 98.4% correct classification ratio to choose correct parse output, which is an excellent level for the literature.

中文翻译:

学习词向量量化

我们引入了一个名为学习词向量量化 (LWQ) 的新分类器来解决土耳其语中的形态学歧义,这是一种粘着性语言。首先,一个新的和形态注释的语料库,然后通过一系列过程准备其数据集。根据数据集,LWQ 通过在欧几里得空间中移动它们来找到最佳的词向量位置。LWQ 进行形态消歧分两步:首先,它使用形态分析器定义歧义词的所有候选解;其次,它根据与不模棱两可的相邻词的总距离来选择最佳候选词。为了展示 LWQ 的性能,我们通过考虑分类的一致性对语料库进行了许多测试。在实验中,我们达到了 98.4% 的正确分类率来选择正确的解析输出,
更新日期:2020-06-18
down
wechat
bug