当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
APFA: Automated Product Feature Alignment for Duplicate Detection
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.eswa.2021.114759
Nick Valstar , Flavius Frasincar , Gianni Brauwers

To keep up with the growing interest of using Web shops for product comparison, we have developed a method that targets the problem of product duplicate detection. If duplicates can be discovered correctly and quickly, customers can compare products in an efficient manner. We build upon the state-of-the-art Multi-component Similarity Method (MSM) for product duplicate detection by developing an automated pre-processing phase that occurs before the similarities between products are calculated. Specifically, in this prior phase the features of products are aligned between Web shops, using metrics such as the data type, coverage, and diversity of each key, as well as the distribution and used measurement units of their corresponding values. With this information, the values of these keys can be more meaningfully and efficiently employed in the process of comparing products. Applying our method to a real-world dataset of 1629 TV’s across 4 Web shops, we find that we increase the speed of the product similarity phase by roughly a factor 3 due to fewer meaningless comparisons, an improved brand analyzer, and a renewed title analyzer. Moreover, in terms of quality of duplicate detection, we significantly outperform MSM with an F1-measure of 0.746 versus 0.525.



中文翻译:

APFA:用于重复检测的自动产品特征对齐

为了跟上使用网上商店进行产品比较的兴趣,我们开发了一种针对产品重复检测问题的方法。如果可以正确,快速地发现重复项,则客户可以有效地比较产品。我们通过开发在计算产品之间的相似度之前发生的自动预处理阶段,建立在用于产品重复检测的最新多组分相似度方法(MSM)的基础上。具体来说,在此先前阶段中,使用诸如数据类型,覆盖范围和每个键的多样性以及度量值及其相应值的分布和使用的度量等度量标准,使Web商店之间的产品功能保持一致。有了这些信息,这些键的值可以在比较产品的过程中更有意义和更有效地使用。将我们的方法应用于4家网上商店的1629家电视的真实数据集,我们发现,由于更少的无意义比较,改进的品牌分析器和更新的标题分析器,我们将产品相似性阶段的速度提高了大约3倍。此外,就重复检测的质量而言,我们在性能上明显优于MSM。F1个-测量值为0.746与0.525。

更新日期:2021-02-26
down
wechat
bug