当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11090
Yadollah Yaghoobzadeh, Alexandre Rochette, Timothy J. Hazen

Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.

中文翻译:

通过记忆进行跨域泛化:神经重复问题检测中最近邻的研究

重复问题检测(DQD)对于提高社区和自动问题解答系统的效率很重要。不幸的是,在一个域中收集监督数据既耗时又昂贵,而且我们在整个域中利用注释的能力很小。在这项工作中,我们利用神经表示并研究最近的邻居以进行DQD中的跨域泛化。我们首先在丰富的表示空间中对源域和目标域的问题对进行编码,然后使用基于k近邻检索的方法,对邻居的标签和距离进行汇总,以对行进行排序。我们在StackExchange,Spring和Quora数据集的不同跨域方案中观察到了该方法的强大性能,在多种情况下均优于交叉熵分类。
更新日期:2020-11-25
down
wechat
bug