当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-based exception mining for object-relational data
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2020-02-19 , DOI: 10.1007/s10618-020-00677-w
Fatemeh Riahi , Oliver Schulte

This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously applied for subgroup discovery in propositional data; our novel contribution is to develop EMM for relational data. EMM leverages machine learning models for exception mining: An object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. In relational data, EMM can therefore be used for detecting single outlier or exceptional objects. We combine EMM with state-of-the-art statistical-relational model discovery methods for constructing a graphical model (Bayesian network), that compactly represents probabilistic associations in the data. We investigate several outlierness metrics, based on the learned object-relational model, that quantify the extent to which the association pattern of a potential outlier object deviates from that of the whole population. Our method is validated on synthetic data sets and on real-world data sets about soccer and hockey matches, IMDb movies and mutagenic compounds. Compared to baseline methods, the EMM approach achieved the best detection accuracy when combined with a novel outlinerness metric. An empirical evaluation on soccer and movie data shows a strong correlation between our novel outlierness metric and success metrics: Individuals that our metric marks out as unusual tend to have unusual success.

中文翻译:

基于模型的对象关系数据异常挖掘

本文针对对象关系数据的情况,开发了基于模型的异常挖掘和离群值检测。对象关系数据表示一个复杂的异构网络,该网络包含不同类型的对象,这些对象之间的链接(也包括不同类型)以及这些链接的属性。我们遵循完善的例外模型挖掘(EMM)框架,该框架先前已应用于命题数据中的子组发现;我们的新贡献是开发用于关系数据的EMM。EMM利用机器学习模型来进行异常挖掘:在为对象数据学习的模型与为一般人群学习的模型不同的程度上,对象是特殊的。因此,在关系数据中,EMM可用于检测单个异常或异常对象。我们将EMM与最新的统计关系模型发现方法结合起来,以构建图形模型(贝叶斯网络),从而紧凑地表示数据中的概率关联。我们基于学习到的对象关系模型,研究了几种离群值度量标准,这些度量量化了潜在离群对象的关联模式与整个总体的偏离程度。我们的方法在有关足球和曲棍球比赛,IMDb电影和诱变化合物的合成数据集和真实数据集上得到了验证。与基线方法相比,EMM方法与新颖的轮廓度指标结合使用时可实现最佳检测精度。对足球和电影数据的经验评估表明,我们新颖的离群值指标与成功指标之间具有很强的相关性:
更新日期:2020-02-19
down
wechat
bug