当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11090 Yadollah Yaghoobzadeh, Alexandre Rochette, Timothy J. Hazen
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.11090 Yadollah Yaghoobzadeh, Alexandre Rochette, Timothy J. Hazen
Duplicate question detection (DQD) is important to increase efficiency of
community and automatic question answering systems. Unfortunately, gathering
supervised data in a domain is time-consuming and expensive, and our ability to
leverage annotations across domains is minimal. In this work, we leverage
neural representations and study nearest neighbors for cross-domain
generalization in DQD. We first encode question pairs of the source and target
domain in a rich representation space and then using a k-nearest neighbour
retrieval-based method, we aggregate the neighbors' labels and distances to
rank pairs. We observe robust performance of this method in different
cross-domain scenarios of StackExchange, Spring and Quora datasets,
outperforming cross-entropy classification in multiple cases.
中文翻译:
通过记忆进行跨域泛化:神经重复问题检测中最近邻的研究
重复问题检测(DQD)对于提高社区和自动问题解答系统的效率很重要。不幸的是,在一个域中收集监督数据既耗时又昂贵,而且我们在整个域中利用注释的能力很小。在这项工作中,我们利用神经表示并研究最近的邻居以进行DQD中的跨域泛化。我们首先在丰富的表示空间中对源域和目标域的问题对进行编码,然后使用基于k近邻检索的方法,对邻居的标签和距离进行汇总,以对行进行排序。我们在StackExchange,Spring和Quora数据集的不同跨域方案中观察到了该方法的强大性能,在多种情况下均优于交叉熵分类。
更新日期:2020-11-25
中文翻译:
通过记忆进行跨域泛化:神经重复问题检测中最近邻的研究
重复问题检测(DQD)对于提高社区和自动问题解答系统的效率很重要。不幸的是,在一个域中收集监督数据既耗时又昂贵,而且我们在整个域中利用注释的能力很小。在这项工作中,我们利用神经表示并研究最近的邻居以进行DQD中的跨域泛化。我们首先在丰富的表示空间中对源域和目标域的问题对进行编码,然后使用基于k近邻检索的方法,对邻居的标签和距离进行汇总,以对行进行排序。我们在StackExchange,Spring和Quora数据集的不同跨域方案中观察到了该方法的强大性能,在多种情况下均优于交叉熵分类。