当前位置: X-MOL 学术Sustain. Comput. Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Correlation and congruence modulo based clustering technique and its application in energy classification
Sustainable Computing: Informatics and Systems ( IF 3.8 ) Pub Date : 2021-05-07 , DOI: 10.1016/j.suscom.2021.100561
Muhammad Shaheen , Saif ur Rehman , Fahad Ghaffar

Clustering is an unsupervised classification technique used to form groups of unlabeled data sets called clusters. K-means algorithm is a popular clustering algorithm in which random cluster centers are initially taken. The cluster centers are randomly picked in most of the clustering techniques for which the results obtained by these techniques might be compromised. In clustering techniques, the cluster centers are recalculated iteratively unless convergence is achieved, which once again may compromise the accuracy of the results. In all these iterations, the data elements continue to switch to the neighboring clusters, which may add a bias to the clustering results. Thus, a new clustering technique known as “Clustering through correlation and congruence modulo (CCCM),”is developed based on the correlation reward (reinforcement factor) and the congruence modulo operator. In the CCCM technique, cluster centroids are fixed and selected in the first iteration by arranging all the involved variables in order of importance that is calculated by using spearman ranked correlation analysis. After arranging these variables, the congruence modulo is used to convert these variables into equally sized buckets. The correlation values for the elements placed in these buckets are again calculated and the difference is reinforced by bucket rearrangement. When the initial cluster centers are selected, the points are placed in clusters (data instances) like the conventional K-means clustering algorithm. This newly developed algorithm is tested on energy data from 40 countries and each country has 16 energy parameters collected from the online sources over a period of ten years. The proposed technique produced more accurate clusters in less time (achieved accuracy and efficiency) as compared to the K-mean algorithm.



中文翻译:

基于相关同余模的聚类技术及其在能量分类中的应用

聚类是一种无监督的分类技术,用于形成称为簇的未标记数据集的组。K-均值算法是一种流行的聚类算法,其中最初采用随机聚类中心。在大多数聚类技术中都是随机选择聚类中心的,这些技术所获得的结果可能会受到影响。在聚类技术中,除非实现收敛,否则将迭代地重新计算聚类中心,这可能会再次损害结果的准确性。在所有这些迭代中,数据元素继续切换到相邻的聚类,这可能会增加聚类结果的偏差。因此,一种称为“通过相关和同余模量(CCCM)进行聚类”的新聚类技术,”是根据相关奖励(增强因子)和全等模运算符开发的。在CCCM技术中,通过按Spearman排序相关性分析计算出的重要程度的顺序排列所有涉及的变量,在第一次迭代中固定并选择了聚类质心。排列完这些变量后,将使用同余模将这些变量转换为大小相等的存储桶。再次计算放置在这些铲斗中的元素的相关性值,并通过铲斗重排来增强差异。When the initial cluster centers are selected, the points are placed in clusters (data instances) like the conventional K-means clustering algorithm. 这项新开发的算法在40个国家/地区的能源数据上进行了测试,并且每个国家/地区在十年内从在线资源中收集了16个能源参数。与K-mean算法相比,所提出的技术可以在更短的时间内(获得更高的准确性和效率)产生更准确的簇。

更新日期:2021-05-17
down
wechat
bug