当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
New uncertainty measurement for categorical data based on fuzzy information structures: An application in attribute reduction
Information Sciences ( IF 8.1 ) Pub Date : 2021-08-29 , DOI: 10.1016/j.ins.2021.08.089
Qinli Zhang , Yiying Chen , Gangqiang Zhang , Zhaowen Li , Lijun Chen , Ching-Feng Wen

Categorical data is a significant kind of data in machine learning.Generally, rough set theory (RS-theory) deals with categorical data in the following way.First, an equivalence relation based on the equality of attribute values of categorical data is established.Then, information granules (I-granules) based on equivalence classes are obtained.Finally, information structures (I-structures) consisting of I-granules are formed.However, an equivalence relation is too strict, and there are some limitations in the I-structure of a categorical information system (CIS) that may result in filtering out potentially useful information.This paper investigates fuzzy information structures (FI-structures) and new uncertainty measurements for categorical data from the perspective that “the equality of attribute values is fed back to the attribute set”.First, a fuzzy symmetry relation based on the number of attributes with equal attribute values is established. Then, fuzzy information granules (FI-granules) based on the fuzzy symmetry relation are obtained. Next, FI-structures consisting of FI-granules are formed.Finally, some concepts related to FI-structures in a CIS are given.The set vector is used to denote FI-structures, and the inclusion degree is used to study the dependence between FI-structures.In addition, four new uncertainty measurements based on FI-structures in a CIS are proposed, including fuzzy information granulation (Gf), fuzzy information entropy (Hf), fuzzy rough entropy (Erf) and fuzzy information amount (Ef).Moreover, numerical experiments and statistical tests to evaluate the performance of the proposed new measurements are carried out.The results of the paired t-test show that the performance of the four new measurements based on FI-structures is better than that of the corresponding four measurements based on I-structures.Finally, attribute reduction algorithms based on Gf and Hf are presented, and clustering analysis is conducted on the reduced CIS. The experimental results show that the proposed algorithms are effective and perform well on attribute reduction according to three evaluation indicators of clustering performance.



中文翻译:

基于模糊信息结构的分类数据新的不确定性度量:在属性约简中的应用

分类数据是机器学习中的一种重要数据一般来说,粗糙集理论( RS- theory)处理分类数据的方式如下。首先,建立基于分类数据属性值相等的等价关系。然后,信息颗粒(-granules)的基础上等价类是obtained.Finally,信息结构(-structures)由-granules被formed.However,等价关系过于严格,并且有在一定的局限性-分类信息系统 (CIS) 的结构可能会导致过滤掉潜在有用的信息。 本文研究了模糊信息结构 (FI- structures)和新的从“属性值的相等性反馈到属性集”的角度对分类数据的不确定性度量。首先,建立了基于具有相等属性值的属性个数的模糊对称关系。然后,得到基于模糊对称关系的模糊信息粒(FI- granules)。接着,FI -structures由FI -granules是formed.Finally,相关的一些概念,FI在CIS -structures是given.The组矢量被用于表示FI -structures,并且包含度是用来研究之间的依赖FI结构。此外,提出了基于CIS 中FI结构的四种新的不确定性测量,包括模糊信息粒度(GF), 模糊信息熵 (HF), 模糊粗熵 (rF) 和模糊信息量 (F).此外,进行了数值实验和统计测试以评估所提出的新测量的性能。配对的结果 -test 表明基于FI -structures的四个新度量的性能优于基于I -structures的相应四个度量。 最后,基于GFHF提出,并对简化的 CIS 进行聚类分析。实验结果表明,从聚类性能的三个评价指标来看,所提出的算法是有效的,并且在属性约简方面表现良好。

更新日期:2021-09-09
down
wechat
bug