当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Survey on Blocking Technology of Entity Resolution
Journal of Computer Science and Technology ( IF 1.2 ) Pub Date : 2020-07-01 , DOI: 10.1007/s11390-020-0350-4
Bo-Han Li , Yi Liu , An-Man Zhang , Wen-Huan Wang , Shuo Wan

Entity resolution (ER) is a significant task in data integration, which aims to detect all entity profiles that correspond to the same real-world entity. Due to its inherently quadratic complexity, blocking was proposed to ameliorate ER, and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pairwise comparisons inside each block in order to reduce the computational cost of ER. This paper presents a comprehensive survey on existing blocking technologies. We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques. We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces. How to use schema information flexibly is of great significance to efficiently process data with the new features of this era. Machine learning is an important tool for ER, but end-to-end and efficient machine learning methods still need to be explored. We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER, incremental blocking ER, deep learning with ER, etc.

中文翻译:

实体解析阻塞技术综述

实体解析 (ER) 是数据集成中的一项重要任务,其目的是检测对应于同一现实世界实体的所有实体配置文件。由于其固有的二次复杂性,分块被提出来改善 ER,它提供了一种近似的解决方案,将相似的实体配置文件聚类到块中,以便在每个块内执行成对比较以降低 ER 的计算成本。本文对现有的阻塞技术进行了全面的调查。我们总结和分析了所有经典的分块方法,重点是不同的分块构造和优化技术。我们发现依赖于固定模式的传统阻塞 ER 方法可能无法在高度异构的信息空间的上下文中工作。如何灵活地使用模式信息对于高效处理具有时代新特征的数据具有重要意义。机器学习是ER的重要工具,但端到端、高效的机器学习方法仍有待探索。我们还从实时阻塞 ER、增量阻塞 ER、深度学习与 ER 等方向总结并为未来工作提供了最有希望的趋势。
更新日期:2020-07-01
down
wechat
bug