当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A study on using data clustering for feature extraction to improve the quality of classification
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2021-05-04 , DOI: 10.1007/s10115-021-01572-6
Maciej Piernik , Tadeusz Morzy

There is a certain belief among data science researchers and enthusiasts alike that clustering can be used to improve classification quality. Insofar as this belief is fairly uncontroversial, it is also very general and therefore produces a lot of confusion around the subject. There are many ways of using clustering in classification and it obviously cannot always improve the quality of predictions, so a question arises, in which scenarios exactly does it help? Since we were unable to find a rigorous study addressing this question, in this paper, we try to shed some light on the concept of using clustering for classification. To do so, we first put forward a framework for incorporating clustering as a method of feature extraction for classification. The framework is generic w.r.t. similarity measures, clustering algorithms, classifiers, and datasets and serves as a platform to answer ten essential questions regarding the studied subject. Each answer is formulated based on a separate experiment on 16 publicly available datasets, followed by an appropriate statistical analysis. After performing the experiments and analyzing the results separately, we discuss them from a global perspective and form general conclusions regarding using clustering as feature extraction for classification.



中文翻译:

利用数据聚类进行特征提取以提高分类质量的研究

数据科学研究人员和爱好者之间都存在一定的信念,即可以使用聚类来提高分类质量。只要这种信念是没有争议的,它也是非常笼统的,因此在这个问题上引起了很多困惑。在分类中使用聚类的方式有很多种,显然不能总是提高预测的质量,因此出现了一个问题,在哪种情况下它确实有帮助?由于我们无法找到针对此问题的严格研究,因此在本文中,我们尝试阐明使用聚类进行分类的概念。为此,我们首先提出了一个将聚类作为特征提取方法进行分类的框架。该框架是通用的相似性度量,聚类算法,分类器,和数据集,并作为回答有关所研究主题的十个基本问题的平台。每个答案都是根据对16个公开可用数据集的单独实验制定的,然后进行适当的统计分析。在进行了实验并分别分析了结果之后,我们从全局的角度讨论了它们,并就使用聚类作为分类的特征提取得出了一般性的结论。

更新日期:2021-05-04
down
wechat
bug