当前位置: X-MOL 学术Found. Trends Inf. Ret. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Federated Search
Foundations and Trends in Information Retrieval ( IF 10.4 ) Pub Date : 2011-3-6 , DOI: 10.1561/1500000010
Milad Shokouhi , Luo Si

Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot easily index uncrawlable hidden web collections while federated search systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections.

There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated search systems need to acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem.

The goal of this work, is to provide a comprehensive summary of the previous research on the federated search challenges described above.



中文翻译:

联合搜索

联合搜索(联合信息检索或分布式信息检索)是一种用于同时搜索多个文本集合的技术。查询将提交给最有可能返回相关答案的集合子集。所选集合返回的结果将合并并合并到一个列表中。在许多环境中,联合搜索优于集中式搜索。例如,诸如Google之类的商业搜索引擎无法轻松索引无法抓取的隐藏Web集合,而联合搜索系统则可以搜索隐藏Web集合的内容而不会进行爬网。在每个组织都维护独立搜索引擎的企业环境中,联合搜索技术可以在多个集合上提供并行搜索。

联合搜索面临三个主要挑战。对于每个查询,选择最有可能返回相关文档的集合子集。这会产生集合选择问题。为了能够选择合适的集合,联合搜索系统需要获取有关每个集合内容的一些知识,从而产生集合表示问题。从选定集合返回的结果在最终呈现给用户之前会被合并。最后一步是结果合并问题。

这项工作的目的是提供有关上述联邦搜索挑战的先前研究的全面摘要。

更新日期:2011-03-06
down
wechat
bug