当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Overview of End-to-End Entity Resolution for Big Data
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2020-12-06 , DOI: 10.1145/3418896
Vassilis Christophides 1 , Vasilis Efthymiou 2 , Themis Palpanas 3 , George Papadakis 4 , Kostas Stefanidis 5
Affiliation  

One of the most critical tasks for improving data quality and increasing the reliability of data analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to the same real-world entity. Despite several decades of research, ER remains a challenging problem. In this survey, we highlight the novel aspects of resolving Big Data entities when we should satisfy more than one of the Big Data characteristics simultaneously (i.e., Volume and Velocity with Variety). We present the basic concepts, processing steps, and execution strategies that have been proposed by database, semantic Web, and machine learning communities in order to cope with the loose structuredness , extreme diversity , high speed, and large scale of entity descriptions used by real-world applications. We provide an end-to-end view of ER workflows for Big Data, critically review the pros and cons of existing methods, and conclude with the main open research directions.

中文翻译:

大数据端到端实体解析概述

提高数据质量和提高数据分析可靠性的最关键任务之一是实体解析(ER),旨在识别引用同一现实世界实体的不同描述。尽管进行了几十年的研究,ER 仍然是一个具有挑战性的问题。在本次调查中,我们强调了解决大数据实体的新方面,当我们应该同时满足一个以上的大数据特征(即,数量和速度与多样性)时。我们介绍了数据库、语义网和机器学习社区提出的基本概念、处理步骤和执行策略,以应对松散的结构化, 极端多样性, 高的速度,和大规模实际应用程序使用的实体描述。我们提供了大数据 ER 工作流程的端到端视图,批判性地审查了现有方法的优缺点,并总结了主要的开放研究方向。
更新日期:2020-12-06
down
wechat
bug