Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11090
Yadollah Yaghoobzadeh, Alexandre Rochette, Timothy J. Hazen

Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.

中文翻译：

通过记忆进行跨域泛化：神经重复问题检测中最近邻的研究

重复问题检测（DQD）对于提高社区和自动问题解答系统的效率很重要。不幸的是，在一个域中收集监督数据既耗时又昂贵，而且我们在整个域中利用注释的能力很小。在这项工作中，我们利用神经表示并研究最近的邻居以进行DQD中的跨域泛化。我们首先在丰富的表示空间中对源域和目标域的问题对进行编码，然后使用基于k近邻检索的方法，对邻居的标签和距离进行汇总，以对行进行排序。我们在StackExchange，Spring和Quora数据集的不同跨域方案中观察到了该方法的强大性能，在多种情况下均优于交叉熵分类。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>