当前位置: X-MOL 学术Neural Comput. & Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic identification of the number of clusters in hierarchical clustering
Neural Computing and Applications ( IF 6 ) Pub Date : 2021-03-13 , DOI: 10.1007/s00521-021-05873-3
Ashutosh Karna , Karina Gibert

Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.



中文翻译:

自动识别分层聚类中的聚类数目

分层聚类是在无监督学习的情况下发现数据集的基础真实结构的最合适工具之一,在这种情况下,基础事实是未知的,而经典机器学习分类器则不合适。在许多实际应用中,它提供了内部数据结构的透视图,它是分区方法的首选。但是,确定层次聚类中生成的聚类数量需要专业人员来从树状图上推论得出,这在制造全自动系统(如工业4.0中的决策支持所需的系统)时是一个重大挑战。通过比较基于Calinski-Harabasz的六个原始标准,这项研究提出了自动执行树状图切割的一般标准指数。针对专家提出的类别数量,评估了每个标准在95种实际拓扑不同的树状图上的性能,并确定了优胜者标准。尽管所提出的标准可以用于分层聚类的其他实际应用中,但这项研究是在一个较大的项目中进行的,该项目将构建一个智能决策支持系统,以基于传感器数据实时评估3D打印机的性能。从3D打印机中提取真实数据集,并且通过比较修改整个群集方法前后的CPU时间,还显示出CPU时间的大幅减少。通过检查树状图,它也降低了对人类专家提供簇数的依赖性。进一步,

更新日期:2021-03-15
down
wechat
bug