当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Operator implementation of Result Set Dependent KWS scoring functions
Information Systems ( IF 3.7 ) Pub Date : 2019-11-18 , DOI: 10.1016/j.is.2019.101465
Vinay M.S. , Jayant R. Haritsa

A popular approach to hosting Keyword Search Systems (KWS) on relational DBMS platforms is to employ the Candidate Network framework. The quality of a Candidate Network-based search is critically dependent on the scoring function used to rank the relevant answers. In this paper, we first demonstrate, through detailed empirical and conceptual analysis studies, that the Labrador scoring function provides the best user relevance among contemporary Candidate Network scoring functions.

Efficiently incorporating the Labrador function, however, is rendered difficult due to its Result Set Dependent (RSD) characteristic, wherein the distribution of keywords in the query results influences the ranking. To address this RSD challenge ►We investigate two mechanisms ►(a) a simple wrapper approach that leverages existing RDBMS functionalities through an SQL wrapper ►And (b) a more sophisticated operator approach wherein the database engine is augmented with custom operators that perform result ranking in the query execution plan.

The above strategies have been implemented on a PostgreSQL codebase, inclusive of integration with the optimizer for the operator approach. A detailed empirical study over real-world data sets, including DBLP and Wikipedia, indicates that the wrapper approach addresses the RSD efficiency issue to a limited extent only. More encouragingly, the operator approach is extremely successful, delivering processing times that are comparable to, or better than, those of non-RSD implementations. We expect these results to aid in the organic hosting of KWS functionality on database systems.



中文翻译:

结果集相关的KWS评分功能的运算符实现

在关系DBMS平台上托管关键字搜索系统(KWS)的一种流行方法是采用候选网络框架。基于候选网络的搜索质量主要取决于用于对相关答案进行排名的评分功能。在本文中,我们首先通过详细的经验和概念分析研究证明,拉布拉多评分功能在当代候选人网络评分功能中提供了最佳的用户相关性。

然而,由于其依赖结果集(RSD)的特性,很难有效地合并拉布拉多函数,其中查询结果中关键字的分布影响排名。为了解决RSD挑战►我们研究了两种机制►(a)通过SQL包装器利用现有RDBMS功能的简单包装器方法►和(b)更复杂的运算符方法,其中数据库引擎通过执行结果排名的自定义运算符进行扩充在查询执行计划中。

以上策略已在PostgreSQL代码库上实现,包括与用于操作员方法的优化器集成。对包括DBLP和Wikipedia在内的现实世界数据集的详细经验研究表明,包装方法仅在有限的程度上解决了RSD效率问题。更令人鼓舞的是,操作员方法极为成功,其处理时间可与非RSD实施相媲美或更好。我们希望这些结果有助于数据库系统中KWS功能的有机托管。

更新日期:2019-11-18
down
wechat
bug