当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mis-categorized entities detection
The VLDB Journal ( IF 4.2 ) Pub Date : 2021-03-06 , DOI: 10.1007/s00778-021-00653-w
Shuang Hao , Nan Tang , Guoliang Li , Jianhua Feng , Ning Wang

Entity categorization, the process of categorizing entities into groups, is an important problem with many applications. However, in practice, many entities are mis-categorized, such as Google Scholar and Amazon products. In this paper, we study the problem of discovering mis-categorized entities from a given group of categorized entities. This problem is inherently hard: All entities within the same group have been “well” categorized by the state-of-the-art solutions. Apparently, it is nontrivial to differentiate them. We propose a novel rule-based framework to solve this problem. It first uses positive rules to compute disjoint partitions of entities, where the partition with the largest size is taken as the correctly categorized partition, namely the pivot partition. It then uses negative rules to identify mis-categorized entities in other partitions that are dissimilar to the entities in the pivot partition. We describe optimizations on applying these rules and discuss how to generate positive/negative rules. In addition, we propose novel strategies to resolve inconsistent rules. Extensive experimental results on real-world datasets show the effectiveness of our solution.



中文翻译:

错误分类的实体检测

实体分类是将实体分类为组的过程,是许多应用程序中的重要问题。但是,实际上,许多实体的分类错误,例如Google Scholar和Amazon产品。在本文中,我们研究了从给定的一组分类实体中发现分类错误的实体的问题。这个问题天生就很难解决:同一组内的所有实体都已通过最新解决方案进行了“良好”分类。显然,区分它们是不平凡的。我们提出了一种新颖的基于规则的框架来解决此问题。它首先使用肯定规则来计算实体的不相交分区,其中将最大尺寸的分区视为正​​确分类的分区,即枢纽分区。然后,它使用否定规则来识别其他分区中与透视分区中的实体不同的错误分类的实体。我们描述了应用这些规则的优化方法,并讨论了如何生成正/负规则。此外,我们提出了新颖的策略来解决不一致的规则。在真实数据集上的大量实验结果证明了我们解决方案的有效性。

更新日期:2021-03-07
down
wechat
bug