当前位置: X-MOL 学术Comput. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
word.alignment: an R package for computing statistical word alignment and its evaluation
Computational Statistics ( IF 1.0 ) Pub Date : 2020-03-23 , DOI: 10.1007/s00180-020-00979-z
Neda Daneshgar , Majid Sarmad

Word alignment has lots of applications in various natural language processing (NLP) tasks. As far as we are aware, there is no word alignment package in the R environment. In this paper, word.alignment, a new R software package is introduced which implements a statistical word alignment model as an unsupervised learning. It uses IBM Model 1 as a machine translation model based on the use of the EM algorithm and the Viterbi search in order to find the best alignment. It also provides the symmetric alignment using three heuristic methods such as union, intersection, and grow-diag. It has also the ability to build an automatic bilingual dictionary applying an innovative rule. The generated dictionary is suitable for a number of NLP tasks. This package provides functions for measuring the quality of the word alignment via comparing the alignment with a gold standard alignment based on five metrics as well. It is easily installed and executable on the mostly widely used platforms. Note that it is easily usable and we show that its results are almost everywhere better than some other word alignment tools. Finally, some examples illustrating the use of word.alignment is provided.

中文翻译:

word.alignment:用于统计字对齐及其评估的R包

单词对齐在各种自然语言处理(NLP)任务中有许多应用程序。据我们所知,R环境中没有单词对齐包。在本文中,介绍了一种新的R软件包word.alignment,该软件包将统计单词对齐模型实现为无监督学习。它使用EM算法和Viterbi搜索将IBM Model 1用作机器翻译模型,以便找到最佳对齐方式。它还使用三种启发式方法(例如并集,交集和增长诊断)提供对称对齐。它也具有使用创新规则构建自动双语词典的能力。生成的字典适用于许多NLP任务。该程序包还提供了通过比较对齐方式与基于五个指标的黄金标准对齐方式来测量单词对齐方式质量的功能。它可以在最广泛使用的平台上轻松安装和执行。请注意,它易于使用,并且我们证明了它的结果几乎比其他任何单词对齐工具都好。最后,提供了一些示例说明word.alignment的用法。
更新日期:2020-03-23
down
wechat
bug