当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reproducible experiments on Three-Dimensional Entity Resolution with JedAI
Information Systems ( IF 3.0 ) Pub Date : 2021-06-17 , DOI: 10.1016/j.is.2021.101830
George Mandilaras , George Papadakis , Luca Gagliardelli , Giovanni Simonini , Emmanouil Thanos , George Giannakopoulos , Sonia Bergamaschi , Themis Palpanas , Manolis Koubarakis , Alicia Lara-Clares , Antonio Fariña

In Papadakis et al. (2020), we presented the latest release of JedAI, an open-source Entity Resolution (ER) system that allows for building a large variety of end-to-end ER pipelines. Through a thorough experimental evaluation, we compared a schema-agnostic ER pipeline based on blocks with another schema-based ER pipeline based on similarity joins. We applied them to 10 established, real-world datasets and assessed them with respect to effectiveness and time efficiency. Special care was taken to juxtapose their scalability, too, using seven established, synthetic datasets. Moreover, we experimentally compared the effectiveness of the batch schema-agnostic ER pipeline with its progressive counterpart. In this companion paper, we describe how to reproduce the entire experimental study that pertains to JedAI’s serial execution through its intuitive user interface. We also explain how to examine the robustness of the parameter configurations we have selected.



中文翻译:

使用 JedAI 进行三维实体解析的可重复实验

在帕帕达基斯等人。(2020 年),我们展示了JedAI的最新版本,是一种开源实体解析 (ER) 系统,允许构建各种端到端的 ER 管道。通过彻底的实验评估,我们将基于块的模式不可知 ER 管道与另一个基于相似连接的基于模式的 ER 管道进行了比较。我们将它们应用于 10 个已建立的真实世界数据集,并评估它们的有效性和时间效率。还特别注意使用七个已建立的合成数据集来并列它们的可扩展性。此外,我们通过实验比较了批处理模式不可知的 ER 管道与其渐进式对应管道的有效性。在这篇配套论文中,我们描述了如何重现与JedAI通过其直观的用户界面串行执行。我们还解释了如何检查我们选择的参数配置的稳健性。

更新日期:2021-06-20
down
wechat
bug