To read this content please select one of the options below:

Embedding based learning for collection selection in federated search

Adamu Garba (School of Computer and Communication Engineering, Jiangsu University, Zhenjiang, China)
Shah Khalid (School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology, Islamabad, Pakistan) (School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China)
Irfan Ullah (Department of Computer Science, University of Peshawar, Peshawar, Pakistan)
Shah Khusro (Department of Computer Science, University of Peshawar, Peshawar, Pakistan)
Diyawu Mumin (School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China) (Computer Science, Tamale Technical University, Tamale, Ghana)

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 28 October 2020

Issue publication date: 2 November 2020

216

Abstract

Purpose

There have been many challenges in crawling deep web by search engines due to their proprietary nature or dynamic content. Distributed Information Retrieval (DIR) tries to solve these problems by providing a unified searchable interface to these databases. Since a DIR must search across many databases, selecting a specific database to search against the user query is challenging. The challenge can be solved if the past queries of the users are considered in selecting collections to search in combination with word embedding techniques. Combining these would aid the best performing collection selection method to speed up retrieval performance of DIR solutions.

Design/methodology/approach

The authors propose a collection selection model based on word embedding using Word2Vec approach that learns the similarity between the current and past queries. They used the cosine and transformed cosine similarity models in computing the similarities among queries. The experiment is conducted using three standard TREC testbeds created for federated search.

Findings

The results show significant improvements over the baseline models.

Originality/value

Although the lexical matching models for collection selection using similarity based on past queries exist, to the best our knowledge, the proposed work is the first of its kind that uses word embedding for collection selection by learning from past queries.

Keywords

Citation

Garba, A., Khalid, S., Ullah, I., Khusro, S. and Mumin, D. (2020), "Embedding based learning for collection selection in federated search", Data Technologies and Applications, Vol. 54 No. 5, pp. 703-717. https://doi.org/10.1108/DTA-01-2019-0005

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles