当前位置: X-MOL 学术SICS Softw.-Inensiv. Cyber-Phys. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards task-based parallelization for entity resolution
SICS Software-Intensive Cyber-Physical Systems Pub Date : 2019-08-26 , DOI: 10.1007/s00450-019-00409-6
Leonardo Gazzarri , Melanie Herschel

Entity resolution (ER) refers to the problem of finding which virtual representations in one or more data sources refer to the same real-world entity. A central question in ER is how to find matching entity representations (so called duplicates) efficiently and in a scalable way. One general technique to address these issues is to leverage parallelization. In particular, almost all work on parallel ER focus on data parallelism. This paper focuses on task parallelism for ER. This type of parallelism allows to support incremental ER that offers incremental computation of the solution by streaming results of intermediate stages of ER as soon as they are computed. This possibly allows to obtain results in a more timely fashion and can also serve in a service-oriented setting with limited time or monetary budget. In summary, this paper presents a framework for task-parallelization of ER, supporting in particular ER of large amounts of semi-structured and heterogeneous data. We also discuss a possible implementation of our framework.



中文翻译:

面向实体解析的基于任务的并行化

实体解析 (ER) 是指查找一个或多个数据源中的哪些虚拟表示引用同一现实世界实体的问题。 ER 的一个核心问题是如何以可扩展的方式高效地找到匹配的实体表示(所谓的重复项)。解决这些问题的一种通用技术是利用并行化。特别是,几乎所有关于并行 ER 的工作都集中在数据并行性上。本文重点关注 ER 的任务并行性。这种类型的并行性允许支持增量 ER,通过在计算 ER 中间阶段的结果后立即流式传输结果来提供解决方案的增量计算。这可能允许更及时地获得结果,并且还可以在时间或金钱预算有限的面向服务的环境中提供服务。总之,本文提出了一个 ER 任务并行化框架,特别支持大量半结构化和异构数据的 ER。我们还讨论了我们框架的可能实现。

更新日期:2019-08-26
down
wechat
bug