当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating cross-lingual textual similarity on dictionary alignment problem
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2020-06-29 , DOI: 10.1007/s10579-020-09498-1
Yiğit Sever , Gönenç Ercan

Bilingual or even polylingual word embeddings created many possibilities for tasks involving multiple languages. While some tasks like cross-lingual information retrieval aim to satisfy users’ multilingual information needs, some enable transferring valuable information from resource-rich languages to resource-poor ones. In any case, it is important to build and evaluate methods that operate in a cross-lingual setting. In this paper, Wordnet definitions in 7 different languages are used to create a semantic textual similarity testbed to evaluate cross-lingual textual semantic similarity methods. A document alignment task is created to be used between Wordnet glosses of synsets in 7 different languages. Unsupervised textual similarity methods—Wasserstein distance, Sinkhorn distance and cosine similarity—are compared with a supervised Siamese deep learning model. The task is modeled both as a retrieval task and an alignment task to investigate the hubness of the semantic similarity functions. Our findings indicate that considering the problem as a retrieval and alignment problem has a detrimental effect on the results. Furthermore, we show that cross-lingual textual semantic similarity can be used as an automated Wordnet construction method.



中文翻译:

在字典对齐问题上评估跨语言文本相似度

双语甚至多语言单词嵌入为涉及多种语言的任务创造了许多可能性。尽管诸如跨语言信息检索之类的某些任务旨在满足用户的多语言信息需求,但某些任务却可以将有价值的信息从资源丰富的语言转移到资源贫乏的语言。无论如何,构建和评估在跨语言环境中运行的方法非常重要。本文使用7种不同语言的Wordnet定义来创建语义文本相似性测试平台,以评估跨语言文本语义相似性方法。创建了文档对齐任务,以在7种不同语言的同义词集的Wordnet词汇表之间使用。无监督文本相似性方法-Wasserstein距离,将Sinkhorn距离和余弦相似度与监督的暹罗深度学习模型进行了比较。将该任务建模为检索任务和对齐任务,以研究语义相似性函数的中心性。我们的发现表明,将问题视为检索和对齐问题会对结果产生不利影响。此外,我们表明跨语言文本语义相似性可以用作自动Wordnet构建方法。

更新日期:2020-07-24
down
wechat
bug