当前位置: X-MOL 学术Parallel Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HySet: A hybrid framework for exact set similarity join using a GPU
Parallel Computing ( IF 1.4 ) Pub Date : 2021-04-30 , DOI: 10.1016/j.parco.2021.102790
Christos Bellas , Anastasios Gounaris

Set similarity join is a fundamental operation used in a wide range of applications such as data mining, data cleaning and entity resolution. Existing methods proposed for set similarity join conform to a filter-verification framework where potential candidate pairs are generated in the filtering phase and then undergo a verification phase to output the final result. Several different kinds of filtering techniques have been proposed and techniques also differentiate in the manner they couple filtering with verification. However, it has been shown that no globally dominant technique exists. Depending on the dataset and query characteristics, each technique has its own strong and weak points. Based on these findings, the main contribution of this work is the development of a hybrid framework for the set similarity join operation for a single GPU-equipped machine setting. Our framework encapsulates a partitioning mechanism to utilize appropriately both the CPU and the GPU. We present all technical details and we show performance speedups up to 3.25x after thorough evaluation.



中文翻译:

HySet:使用GPU进行精确集合相似性连接的混合框架

集合相似性连接是在许多应用程序中使用的基本操作,例如数据挖掘,数据清理和实体解析。建议用于集合相似性连接的现有方法符合过滤器验证框架,其中在过滤阶段生成潜在的候选对,然后经过验证阶段以输出最终结果。已经提出了几种不同类型的过滤技术,并且这些技术还以它们将过滤与验证结合的方式来区分。然而,已经表明不存在全球主导技术。根据数据集和查询特征,每种技术都有其自身的长处和短处。根据这些发现,这项工作的主要贡献是为用于配备GPU的单个机器设置的集合相似性联接操作开发了一个混合框架。我们的框架封装了一种分区机制,以适当地利用CPU和GPU。我们提供了所有技术细节,并且经过全面评估,显示出性能提升高达3.25倍。

更新日期:2021-05-25
down
wechat
bug