当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Better than the best? Answers via model ensemble in density-based clustering
Advances in Data Analysis and Classification ( IF 1.4 ) Pub Date : 2020-10-02 , DOI: 10.1007/s11634-020-00423-6
Alessandro Casa , Luca Scrucca , Giovanna Menardi

With the recent growth in data availability and complexity, and the associated outburst of elaborate modelling approaches, model selection tools have become a lifeline, providing objective criteria to deal with this increasingly challenging landscape. In fact, basing predictions and inference on a single model may be limiting if not harmful; ensemble approaches, which combine different models, have been proposed to overcome the selection step, and proven fruitful especially in the supervised learning framework. Conversely, these approaches have been scantily explored in the unsupervised setting. In this work we focus on the model-based clustering formulation, where a plethora of mixture models, with different number of components and parametrizations, is typically estimated. We propose an ensemble clustering approach that circumvents the single best model paradigm, while improving stability and robustness of the partitions. A new density estimator, being a convex linear combination of the density estimates in the ensemble, is introduced and exploited for group assignment. As opposed to the standard case, where clusters are typically associated to the components of the selected mixture model, we define partitions by borrowing the modal, or nonparametric, formulation of the clustering problem, where groups are linked with high-density regions. Staying in the density-based realm we thus show how blending together parametric and nonparametric approaches may be beneficial from a clustering perspective.



中文翻译:

比最好的还要好?在基于密度的聚类中通过模型集成进行解答

随着数据可用性和复杂性的最新增长,以及复杂的建模方法的大量涌现,模型选择工具已成为一条命脉,为应对日益严峻的形势提供了客观标准。实际上,基于单个模型的预测和推断即使不是有害的,也可能会受到限制;提出了结合不同模型的集成方法来克服选择步骤,并且尤其在有监督的学习框架中被证明是富有成果的。相反,在无人监督的情况下已经对这些方法进行了深入研究。在这项工作中,我们专注于基于模型的聚类公式化,其中通常会估计大量具有不同数量的组件和参数的混合模型。我们提出了一种集成聚类方法,该方法可以绕过单个最佳模型范式,同时可以提高分区的稳定性和鲁棒性。引入了一种新的密度估计器,它是集合中密度估计的凸线性组合,并被用于组分配。与标准情况相反,在标准情况下,聚类通常与所选混合模型的组件相关联,我们通过借用聚类问题的模态或非参数形式来定义分区,其中组与高密度区域链接。因此,我们将停留在基于密度的领域,从聚类的角度来看,将参数方法和非参数方法融合在一起可能会带来好处。作为集合中密度估计的凸线性组合,被引入并用于组分配。与标准情况相反,在标准情况下,聚类通常与所选混合模型的组件相关联,我们通过借用聚类问题的模态或非参数形式来定义分区,其中组与高密度区域链接。因此,我们将停留在基于密度的领域,从聚类的角度来看,将参数方法和非参数方法融合在一起可能会带来好处。作为集合中密度估计的凸线性组合,被引入并用于组分配。与标准情况相反,在标准情况下,聚类通常与所选混合模型的组件相关联,我们通过借用聚类问题的模态或非参数形式来定义分区,其中组与高密度区域链接。因此,我们将停留在基于密度的领域,从聚类的角度来看,将参数方法和非参数方法融合在一起可能会带来好处。聚类问题的公式化,其中组与高密度区域相连。因此,我们将停留在基于密度的领域,从聚类的角度来看,将参数方法和非参数方法融合在一起可能会带来好处。聚类问题的公式化,其中组与高密度区域相连。因此,我们将停留在基于密度的领域,从聚类的角度来看,将参数方法和非参数方法融合在一起可能会带来好处。

更新日期:2020-10-02
down
wechat
bug