Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity
arXiv - CS - Information Retrieval Pub Date : 2021-01-07 , DOI: arxiv-2101.02351
Ankush Chopra, Shruti Agrawal, Sohom Ghosh

Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.

中文翻译：

应用转移学习通过查询问题相似度来改善特定领域的搜索体验

搜索是用于查找信息的最常见平台之一。但是，每当用户使用这样的平台来解决其查询时，大多数结果就会超载。如今，作为查询体验的一部分，提供了对查询的直接答案。问答（QA）检索过程在丰富搜索体验方面起着重要作用。大多数现成的语义文本相似性模型对于格式正确的搜索查询都可以很好地工作，但是当将其应用于普遍性不完整或语法不正确的搜索查询的特定于域的设置时，它们的性能会下降。在本文中，我们讨论了一个用于计算给定输入查询和一组预定义问题之间的相似度的框架，以检索与其最匹配的问题。我们已将其用于金融领域，但是该框架适用于任何特定于域的搜索引擎，并且也可以在其他域中使用。我们在长期短期记忆（LSTM）[3]模型上使用暹罗网络[6]来训练一个分类器，该分类器针对给定的一对问题生成未归一化和归一化的相似性分数。此外，对于这些问题对中的每一个，我们计算其他三个相似度评分：它们的平均word2vec嵌入之间的余弦相似度[15]，使用RoBERTa [17]生成的句子嵌入之间的余弦相似度[17]和自定义的模糊匹配度。最后，我们使用支持向量机[19]开发了一个元分类器，用于结合这五个分数来检测给定的一对问题是否相似。我们对模型进行基准测试”

更新日期：2021-01-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文