当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation
arXiv - CS - Computation and Language Pub Date : 2020-09-19 , DOI: arxiv-2009.09309
Fajri Koto, Ikhwan Koto

Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple word-to-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.

中文翻译:

迈向米南加保语言的计算语言学:情感分析和机器翻译研究

尽管一些语言学家 (Rusmali et al., 1985; Crouch, 2009) 已经相当尝试定义 Minangkabau 的形态和句法,但由于注释资源的稀缺性,这种语言的信息处理仍然不存在。在这项工作中,我们发布了两个 Minangkabau 语料库:从 Twitter 和 Wikipedia 中收集和构建的情感分析和机器翻译。我们使用经典机器学习和序列到序列模型(如 LSTM 和 Transformer)在 Minangkabau 语言中进行了第一个计算语言学。我们的第一个实验表明,在使用印度尼西亚语训练的模型进行测试时,Minangkabau 文本的分类性能显着下降。而在机器翻译实验中,
更新日期:2020-09-22
down
wechat
bug