Clustering algorithm for mixed datasets using density peaks and Self-Organizing Generative Adversarial Networks,Chemometrics and Intelligent Laboratory Systems

当前位置： X-MOL 学术 › Chemometr. Intell. Lab. Systems › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering algorithm for mixed datasets using density peaks and Self-Organizing Generative Adversarial Networks
Chemometrics and Intelligent Laboratory Systems ( IF 3.7 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.chemolab.2020.104070
K. Balaji , K. Lavanya , A. Geetha Mary

Abstract This paper presents a new Density-Peaks and Self-Organizing Generative Adversarial Networks (DP-SO-GAN) for clustering mixed datasets. Many clustering methods depend on the assumption that datasets contain either categorical or numerical attributes. Nevertheless, in real-time, most of the applications include mixed categorical and numerical attributes. In medicine, the clustering of cardiovascular disease is an essential task. The clustering of such data attributes is a vital and challenging issue. First, we transform mixed data attributes such as categorical attributes using a one-hot encoding technique and numerical attributes using normalization techniques. The converted characteristics are input to a Self-Organizing Generative Adversarial Networks (SO-GAN) to learn the feature map. Second, we train two kernel networks, such as the generator and discriminator, and each one holds a trivial amount of convolution kernels. Last, we propose an enhanced density peaks clustering algorithm and computing similarity measure between the data objects in the feature representation. The clustering accuracy for the cardiovascular disease dataset results in 88.32% with a standard deviation of 0.1 and is relatively higher than that of other existing algorithms. The training time for hand-written digits datasets over 300 epochs is 3148.26 s. Experiment results obtained on a set of five datasets demonstrate the merits of the proposed method, especially in terms of the stability and efficiency of network training. The computational complexity of the proposed method in terms of floating-point operations is reduced by around 18% as compared with the classical generative adversarial networks.

中文翻译：

使用密度峰值和自组织生成对抗网络的混合数据集聚类算法

摘要本文提出了一种新的密度峰值和自组织生成对抗网络 (DP-SO-GAN)，用于对混合数据集进行聚类。许多聚类方法依赖于数据集包含分类或数值属性的假设。然而，在实时中，大多数应用程序包括混合的分类和数字属性。在医学上，心血管疾病的聚类是一项必不可少的任务。此类数据属性的聚类是一个至关重要且具有挑战性的问题。首先，我们使用one-hot编码技术转换混合数据属性，例如分类属性，使用归一化技术转换数值属性。转换后的特征被输入到自组织生成对抗网络 (SO-GAN) 以学习特征图。其次，我们训练两个内核网络，例如生成器和鉴别器，每个都拥有少量的卷积核。最后，我们提出了一种增强的密度峰值聚类算法并计算特征表示中数据对象之间的相似性度量。心血管疾病数据集的聚类精度为 88.32%，标准差为 0.1，相对高于其他现有算法。超过 300 个 epoch 的手写数字数据集的训练时间为 3148.26 s。在一组五个数据集上获得的实验结果证明了所提出方法的优点，特别是在网络训练的稳定性和效率方面。

更新日期：2020-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11