当前位置: X-MOL 学术DECISION › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An approach of improving decision tree classifier using condensed informative data
DECISION ( IF 1.5 ) Pub Date : 2021-01-28 , DOI: 10.1007/s40622-020-00265-3
Archana R. Panhalkar , Dharmpal D. Doye

The advancement of new technologies in today’s era produces a vast amount of data. To store, analyze and mine knowledge from huge data requires large space as well as better execution speed. To train classifiers using a large amount of data requires more time and space. To avoid wastage of time and space, there is a need to mine significant information from a huge collection of data. Decision tree is one of the promising classifiers which mine knowledge from huge data. This paper aims to reduce the data to construct efficient decision tree classifier. This paper presents a method which finds informative data to improve the performance of decision tree classifier. Two clustering-based methods are proposed for dimensionality reduction and utilizing knowledge from outliers. These condensed data are applied to the decision tree for high prediction accuracy. The uniqueness of the first method is that it finds the representative instances from clusters that utilize knowledge of its neighboring data. The second method uses supervised clustering which finds the number of cluster representatives for the reduction of data. With an increase in the prediction accuracy of a tree, these methods decrease the size, building time and space required for decision tree classifiers. These novel methods are united into a single supervised and unsupervised Decision Tree based on Cluster Analysis Pre-processing (DTCAP) which hunts the informative instances from a small, medium and large dataset. The experiments are conducted on a standard UCI dataset of different sizes. It illustrates that the method with its simplicity performs a reduction of data up to 50%. It produces a qualitative dataset which enhances the performance of the decision tree classifier.



中文翻译:

利用压缩信息数据改进决策树分类器的一种方法

在当今时代,新技术的进步产生了大量的数据。要存储,分析和挖掘海量数据中的知识,不仅需要很大的空间,而且需要更快的执行速度。要使用大量数据训练分类器,需要更多的时间和空间。为了避免浪费时间和空间,有必要从大量数据中挖掘重要信息。决策树是从海量数据中挖掘知识的有前途的分类器之一。本文旨在通过减少数据量来构造有效的决策树分类器。本文提出了一种发现信息数据的方法,以提高决策树分类器的性能。提出了两种基于聚类的方法来减少维数并利用异常值中的知识。将这些压缩数据应用于决策树以实现较高的预测精度。第一种方法的独特之处在于它从利用其相邻数据知识的群集中找到代表性实例。第二种方法使用监督式聚类,该聚类可找到用于减少数据量的聚类代表数。随着树的预测准确性的提高,这些方法减小了决策树分类器所需的大小,构建时间和空间。这些新方法基于聚类分析预处理(DTCAP)组合成一个受监管和不受监管的决策树,该决策树从小型,中型和大型数据集中搜寻信息量丰富的实例。实验在不同大小的标准UCI数据集上进行。它说明了该方法以其简单性可以减少多达50%的数据。它产生了定性数据集,可增强决策树分类器的性能。

更新日期:2021-03-14
down
wechat
bug