当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analytical review of clustering techniques and proximity measures
Artificial Intelligence Review ( IF 12.0 ) Pub Date : 2020-05-02 , DOI: 10.1007/s10462-020-09840-7
Vivek Mehta , Seema Bawa , Jasmeet Singh

One of the most fundamental approaches to learn and understand from any type of data is by organizing it into meaningful groups (or clusters) and then analyzing them, which is a process known as cluster analysis. During this process of grouping, proximity measures play a significant role in deciding the similarity level of two objects. Moreover, before applying any learning algorithm on a dataset, different aspects related to preprocessing such as dealing with the sparsity of data, leveraging the correlation among features and normalizing the scales of different features are required to be considered. In this study, various proximity measures have been discussed and analyzed from the aforementioned aspects. In addition, a theoretical procedure for selecting a proximity measure for clustering purpose is proposed. This procedure can also be used in the process of designing a new proximity measure. Second, clustering algorithms of different categories have been overviewed and experimentally compared for various datasets of different domains. The datasets have been chosen in such a way that they range from a very low number of dimensions to a very high number of dimensions. Finally, the effect of using different proximity measures is analyzed in partitional and hierarchical clustering techniques based on experiments.

中文翻译:

聚类技术和邻近度量的分析回顾

从任何类型的数据中学习和理解的最基本方法之一是将其组织成有意义的组(或集群),然后对其进行分析,这一过程称为集群分析。在这个分组过程中,邻近度量在决定两个对象的相似度方面起着重要作用。此外,在对数据集应用任何学习算法之前,需要考虑与预处理相关的不同方面,例如处理数据的稀疏性、利用特征之间的相关性以及对不同特征的尺度进行归一化。在本研究中,从上述方面讨论和分析了各种邻近措施。此外,还提出了为聚类目的选择邻近度量的理论程序。此过程也可用于设计新的邻近度量的过程。其次,针对不同领域的各种数据集,对不同类别的聚类算法进行了概述和实验比较。数据集的选择方式使得它们的维度范围从非常少的维度到非常多的维度。最后,在基于实验的分区和层次聚类技术中分析了使用不同邻近度量的效果。
更新日期:2020-05-02
down
wechat
bug