Instance-based Transfer Learning for Multilingual Deep Retrieval,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Instance-based Transfer Learning for Multilingual Deep Retrieval
arXiv - CS - Information Retrieval Pub Date : 2019-11-08 , DOI: arxiv-1911.06111
Andrew O. Arnold, William W. Cohen

Perhaps the simplest type of multilingual transfer learning is instance-based transfer learning, in which data from the target language and the auxiliary languages are pooled, and a single model is learned from the pooled data. It is not immediately obvious when instance-based transfer learning will improve performance in this multilingual setting: for instance, a plausible conjecture is this kind of transfer learning would help only if the auxiliary languages were very similar to the target. Here we show that at large scale, this method is surprisingly effective, leading to positive transfer on all of the 35 target languages and two tasks tested. We analyze this improvement and argue that the most natural explanation, namely direct vocabulary overlap between languages, only partially explains the performance gains: in fact, we demonstrate target-language improvement can occur after adding data from an auxiliary language with no vocabulary in common with the target. This surprising result is due to the effect of transitive vocabulary overlaps between pairs of auxiliary and target languages.

中文翻译：

用于多语言深度检索的基于实例的迁移学习

也许最简单的多语言迁移学习类型是基于实例的迁移学习，其中将来自目标语言和辅助语言的数据池化，然后从池化数据中学习单个模型。在这种多语言环境中，基于实例的迁移学习何时会提高性能并不是很明显：例如，一个合理的推测是，只有当辅助语言与目标语言非常相似时，这种迁移学习才会有帮助。在这里，我们表明，在大规模上，这种方法非常有效，导致所有 35 种目标语言和两个测试任务的正向迁移。我们分析了这种改进并认为最自然的解释，即语言之间的直接词汇重叠，只能部分解释性能提升：事实上，我们证明在添加来自与目标没有共同词汇的辅助语言的数据后，目标语言的改进可以发生。这个令人惊讶的结果是由于辅助语言和目标语言对之间的传递性词汇重叠的影响。

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文