当前位置: X-MOL 学术Log. J. IGPL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks
Logic Journal of the IGPL ( IF 1 ) Pub Date : 2020-09-18 , DOI: 10.1093/jigpal/jzaa041
Patrick G Clark 1 , Cheng Gao 1 , Jerzy W Grzymala-Busse 2 , Teresa Mroczek 3 , Rafal Niemiec 3
Affiliation  

In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and ‘do not care’ conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.

中文翻译:

使用特征集和广义最大一致块来挖掘不完整数据的规则集的复杂性

在本文中,不完整数据集中的缺失属性值具有三种可能的解释:丢失值,属性概念值和“无关”条件。对于规则归纳,我们使用特征集和广义最大一致块。因此,我们对数据挖掘应用了六种不同的方法。正如我们以前的实验所得出的那样,在该实验中,我们使用通过十倍交叉验证评估的错误率作为质量的主要标准,没有一种方法是普遍最佳的方法。因此,我们决定使用由不完整数据集引起的规则集的复杂性来比较我们的六种方法。我们表明,最小的规则集是从具有属性概念值的不完整数据集中得出的,而最复杂的规则集是从具有丢失值的数据集中得出的。
更新日期:2020-09-18
down
wechat
bug