当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A large and evolving cognate database
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2021-05-30 , DOI: 10.1007/s10579-021-09544-6
Khuyagbaatar Batsuren , Gábor Bella , Fausto Giunchiglia

We present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.



中文翻译:

一个庞大且不断发展的同源数据库

我们展示了CogNet,这是一个大规模、自动构建的语义标记同源词数据库——跨语言的共同起源和意义的词。CogNet 不断发展:其当前版本包含超过 800 万个同源对,涵盖 338 种语言和 35 种书写系统,新版本已经在准备中。本文介绍了用于其计算的算法和输入资源,对结果的评估,以及对同源数据的定量分析,从而对语言多样性产生了新的见解。此外,作为使用大规​​模跨语言知识库来提高多语言应用程序质量的示例,我们提出了在跨语言迁移学习框架下使用 CogNet 进行双语词典归纳的案例研究。

更新日期:2021-05-30
down
wechat
bug