当前位置: X-MOL 学术Data Technol. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Embedding based learning for collection selection in federated search
Data Technologies and Applications ( IF 1.7 ) Pub Date : 2020-10-28 , DOI: 10.1108/dta-01-2019-0005
Adamu Garba , Shah Khalid , Irfan Ullah , Shah Khusro , Diyawu Mumin

Purpose

There have been many challenges in crawling deep web by search engines due to their proprietary nature or dynamic content. Distributed Information Retrieval (DIR) tries to solve these problems by providing a unified searchable interface to these databases. Since a DIR must search across many databases, selecting a specific database to search against the user query is challenging. The challenge can be solved if the past queries of the users are considered in selecting collections to search in combination with word embedding techniques. Combining these would aid the best performing collection selection method to speed up retrieval performance of DIR solutions.

Design/methodology/approach

The authors propose a collection selection model based on word embedding using Word2Vec approach that learns the similarity between the current and past queries. They used the cosine and transformed cosine similarity models in computing the similarities among queries. The experiment is conducted using three standard TREC testbeds created for federated search.

Findings

The results show significant improvements over the baseline models.

Originality/value

Although the lexical matching models for collection selection using similarity based on past queries exist, to the best our knowledge, the proposed work is the first of its kind that uses word embedding for collection selection by learning from past queries.



中文翻译:

基于嵌入的学习用于联合搜索中的集合选择

目的

由于搜索引擎的专有性质或动态内容,在搜索深度网络中遇到许多挑战。分布式信息检索(DIR)试图通过为这些数据库提供统一的可搜索界面来解决这些问题。由于DIR必须在许多数据库中进行搜索,因此选择特定的数据库以针对用户查询进行搜索具有挑战性。如果结合单词嵌入技术在选择要搜索的集合时考虑用户的过去查询,就可以解决挑战。结合使用这些将有助于最佳性能的集合选择方法,以加快DIR解决方案的检索性能。

设计/方法/方法

作者提出了一种使用Word2Vec方法基于单词嵌入的集合选择模型,该模型可了解当前查询与过去查询之间的相似性。他们使用余弦和变换后的余弦相似度模型来计算查询之间的相似度。使用为联邦搜索创建的三个标准TREC测试平台进行了该实验。

发现

结果表明,与基准模型相比,有了明显的改进。

创意/价值

尽管存在使用基于过去查询的相似性进行集合选择的词法匹配模型,但据我们所知,该提议的工作是将单词嵌入用于通过从过去查询中学习来进行集合选择的同类研究中的第一个。

更新日期:2020-11-02
down
wechat
bug