当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Cross-Lingual Arabic Information REtrieval (CLAIRE) System
arXiv - CS - Information Retrieval Pub Date : 2021-07-29 , DOI: arxiv-2107.13751
Zhizhong Chen, Carsten Eickhoff

Despite advances in neural machine translation, cross-lingual retrieval tasks in which queries and documents live in different natural language spaces remain challenging. Although neural translation models may provide an intuitive approach to tackle the cross-lingual problem, their resource-consuming training and advanced model structures may complicate the overall retrieval pipeline and reduce users engagement. In this paper, we build our end-to-end Cross-Lingual Arabic Information REtrieval (CLAIRE) system based on the cross-lingual word embedding where searchers are assumed to have a passable passive understanding of Arabic and various supporting information in English is provided to aid retrieval experience. The proposed system has three major advantages: (1) The usage of English-Arabic word embedding simplifies the overall pipeline and avoids the potential mistakes caused by machine translation. (2) Our CLAIRE system can incorporate arbitrary word embedding-based neural retrieval models without structural modification. (3) Early empirical results on an Arabic news collection show promising performance.

中文翻译:

跨语言阿拉伯语信息检索 (CLAIRE) 系统

尽管神经机器翻译取得了进步,但查询和文档存在于不同自然语言空间中的跨语言检索任务仍然具有挑战性。尽管神经翻译模型可以提供一种直观的方法来解决跨语言问题,但它们消耗资源的训练和高级模型结构可能会使整体检索管道复杂化并降低用户参与度。在本文中,我们基于跨语言词嵌入构建了我们的端到端跨语言阿拉伯语信息检索 (CLAIRE) 系统,其中假设搜索者对阿拉伯语的被动理解尚可,并提供各种英语支持信息以帮助检索经验。建议的系统具有三个主要优点:(1) English-Arabic word embedding的使用简化了整体流程,避免了机器翻译带来的潜在错误。(2) 我们的 CLAIRE 系统可以结合任意基于词嵌入的神经检索模型,而无需进行结构修改。(3) 阿拉伯新闻集的早期实证结果显示出有希望的表现。
更新日期:2021-07-30
down
wechat
bug