当前位置: X-MOL 学术Ann. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identifiability of nonparametric mixture models and Bayes optimal clustering
Annals of Statistics ( IF 3.2 ) Pub Date : 2020-08-01 , DOI: 10.1214/19-aos1887
Bryon Aragam , Chen Dan , Eric P. Xing , Pradeep Ravikumar

Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable by introducing a novel framework for clustering overfitted \emph{parametric} (i.e. misspecified) mixture models. These conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In contrast to the recent literature on estimating nonparametric mixtures, we allow for general nonparametric mixture components, and instead impose regularity assumptions on the underlying mixing measure. As our primary application, we apply these results to partition-based clustering, generalizing the well-known notion of a Bayes optimal partition from classical model-based clustering to nonparametric settings. Furthermore, this framework is constructive in that it yields a practical algorithm for learning identified mixtures, which is illustrated through several examples. The key conceptual device in the analysis is the convex, metric geometry of probability distributions on metric spaces and its connection to optimal transport and the Wasserstein convergence of mixing measures. The result is a flexible framework for nonparametric clustering with formal consistency guarantees.

中文翻译:

非参数混合模型的可识别性和贝叶斯最优聚类

受数据聚类问题的启发,我们通过引入一种新的框架来对过拟合的\emph{参数}(即错误指定的)混合模型进行聚类,从而建立了可识别非参数混合模型族的一般条件。这些条件概括了文献中的现有条件,并且足够灵活以包括例如高斯混合物的混合物。与最近关于估计非参数混合物的文献相反,我们允许一般的非参数混合物分量,而是对基础混合度量强加规律性假设。作为我们的主要应用,我们将这些结果应用于基于分区的聚类,将众所周知的贝叶斯最优分区概念从经典的基于模型的聚类推广到非参数设置。此外,这个框架是建设性的,因为它产生了一个实用的算法来学习识别的混合物,通过几个例子来说明。分析中的关键概念设备是度量空间上概率分布的凸度量几何及其与最佳传输和混合度量的 Wasserstein 收敛的联系。结果是具有形式一致性保证的非参数聚类的灵活框架。
更新日期:2020-08-01
down
wechat
bug