当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On rule acquisition methods for data classification in heterogeneous incomplete decision systems
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-01-07 , DOI: 10.1016/j.knosys.2020.105472
Zuqiang Meng , Zhongzhi Shi

In the age of big data, lots of data obtained is low-quality data characterized by heterogeneousness and incompleteness, referred to as heterogeneous incomplete decision systems (HIDSs) in this paper. Data classification is an important task in machine learning, with the ability to discover valuable knowledge hidden in HIDSs. However, systematic studies on data classification in HIDSs are rarely reported. Especially, there is a lack of adaptive classification methods for HIDSs, which can deal directly with heterogeneous incomplete data and do not require prior discretization of numerical attributes or filling in missing values. In this paper, a unified representation model, called parameterized tolerance granulation model (PTGM), is proposed to deal with heterogeneous incomplete data. And the principle of an adaptive granulation method of constructing appropriate PTGMs is also described using difference-based collaborative optimization. Based on PTGMs, decision logic language is used to describe classifiers consisting of decision rules satisfying given conditions. Then, a discernibility function-based and a heuristic function-based classification methods are proposed to obtain all optimized rule sets (classifiers) and to generate a particular optimized rule set, respectively. The heuristic function-based method is actually an adaptive classification method, which can deal directly with heterogeneous incomplete data. Furthermore, detailed theoretical analyses are given to illustrate the correctness and effectiveness of the proposed methods. The experimental results show that the proposed methods are effective and have obvious advantages in directly handling heterogeneous incomplete data.



中文翻译:

异构不完全决策系统中数据分类的规则获取方法

在大数据时代,获得的许多数据是具有异质性和不完整性的低质量数据,在本文中称为异质性不完整决策系统(HIDS)。数据分类是机器学习中的一项重要任务,具有发现隐藏在HIDS中的宝贵知识的能力。但是,很少有关于HIDS中数据分类的系统研究的报道。特别是,缺少针对HIDS的自适应分类方法,该方法可以直接处理异构的不完整数据,并且不需要事先离散化数值属性或填写缺失值。本文提出了一个统一的表示模型,称为参数化公差粒度模型(PTGM),用于处理异构不完整数据。并利用基于差异的协同优化方法,描述了构建合适的PTGM的自适应造粒方法的原理。基于PTGM,决策逻辑语言用于描述由满足给定条件的决策规则组成的分类器。然后,提出了基于可分辨函数和基于启发式函数的分类方法,以分别获取所有优化规则集(分类器)并生成特定的优化规则集。基于启发式函数的方法实际上是一种自适应分类方法,可以直接处理异构不完整数据。此外,进行了详细的理论分析,以说明所提出方法的正确性和有效性。

更新日期:2020-01-07
down
wechat
bug