当前位置: X-MOL 学术Ann. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Singularity, misspecification and the convergence rate of EM
Annals of Statistics ( IF 3.2 ) Pub Date : 2020-12-01 , DOI: 10.1214/19-aos1924
Raaz Dwivedi , Nhat Ho , Koulik Khamaru , Martin J. Wainwright , Michael I. Jordan , Bin Yu

A line of recent work has characterized the behavior of the EM algorithm in favorable settings in which the population likelihood is locally strongly concave around its maximizing argument. Examples include suitably separated Gaussian mixture models and mixtures of linear regressions. We consider instead over-fitted settings in which the likelihood need not be strongly concave, or, equivalently, when the Fisher information matrix might be singular. In such settings, it is known that a global maximum of the MLE based on $n$ samples can have a non-standard $n^{-1/4}$ rate of convergence. How does the EM algorithm behave in such settings? Focusing on the simple setting of a two-component mixture fit to a multivariate Gaussian distribution, we study the behavior of the EM algorithm both when the mixture weights are different (unbalanced case), and are equal (balanced case). Our analysis reveals a sharp distinction between these cases: in the former, the EM algorithm converges geometrically to a point at Euclidean distance $O((d/n)^{1/2})$ from the true parameter, whereas in the latter case, the convergence rate is exponentially slower, and the fixed point has a much lower $O((d/n)^{1/4})$ accuracy. The slower convergence in the balanced over-fitted case arises from the singularity of the Fisher information matrix. Analysis of this singular case requires the introduction of some novel analysis techniques, in particular we make use of a careful form of localization in the associated empirical process, and develop a recursive argument to progressively sharpen the statistical rate.

中文翻译:

EM 的奇异性、错误指定和收敛速度

最近的一系列工作描述了 EM 算法在有利环境中的行为,在这些环境中,总体似然在其最大化参数周围是局部强烈凹入的。示例包括适当分离的高斯混合模型和线性回归的混合。我们考虑过度拟合的设置,其中似然不必是强凹的,或者等效地,当 Fisher 信息矩阵可能是奇异的时。在这样的设置中,众所周知,基于 $n$ 个样本的 MLE 的全局最大值可以具有非标准的 $n^{-1/4}$ 收敛率。EM 算法在这种情况下的表现如何?专注于简单设置双分量混合拟合到多元高斯分布,我们研究了 EM 算法在混合权重不同时(不平衡情况)的行为,并且是相等的(平衡情况)。我们的分析揭示了这些情况之间的明显区别:在前者中,EM 算法在几何上收敛到与真实参数相距欧几里得距离 $O((d/n)^{1/2})$ 的点,而在后者中在这种情况下,收敛速度呈指数级变慢,固定点的 $O((d/n)^{1/4})$ 精度要低得多。平衡过拟合情况下收敛速度较慢是由于 Fisher 信息矩阵的奇异性造成的。对这种奇异情况的分析需要引入一些新的分析技术,特别是我们在相关的经验过程中使用了一种谨慎的本地化形式,并开发了一种递归论证来逐步提高统计率。EM 算法在几何上收敛到与真实参数相距欧几里得距离 $O((d/n)^{1/2})$ 的点,而在后一种情况下,收敛速度呈指数级变慢,并且不动点有低得多的 $O((d/n)^{1/4})$ 准确度。平衡过拟合情况下收敛速度较慢是由于 Fisher 信息矩阵的奇异性造成的。对这种奇异情况的分析需要引入一些新的分析技术,特别是我们在相关的经验过程中使用了一种谨慎的本地化形式,并开发了一种递归论证来逐步提高统计率。EM 算法在几何上收敛到与真实参数相距欧几里得距离 $O((d/n)^{1/2})$ 的点,而在后一种情况下,收敛速度呈指数级变慢,并且不动点有低得多的 $O((d/n)^{1/4})$ 准确度。平衡过拟合情况下收敛速度较慢是由于 Fisher 信息矩阵的奇异性造成的。对这种奇异情况的分析需要引入一些新的分析技术,特别是我们在相关的经验过程中使用了一种谨慎的本地化形式,并开发了一种递归论证来逐步提高统计率。并且固定点的 $O((d/n)^{1/4})$ 精度要低得多。平衡过拟合情况下收敛速度较慢是由于 Fisher 信息矩阵的奇异性造成的。对这种奇异情况的分析需要引入一些新的分析技术,特别是我们在相关的经验过程中使用了一种谨慎的本地化形式,并开发了一种递归论证来逐步提高统计率。并且固定点的 $O((d/n)^{1/4})$ 精度要低得多。平衡过拟合情况下收敛速度较慢是由于 Fisher 信息矩阵的奇异性造成的。对这种奇异情况的分析需要引入一些新的分析技术,特别是我们在相关的经验过程中使用了一种谨慎的本地化形式,并开发了一种递归论证来逐步提高统计率。
更新日期:2020-12-01
down
wechat
bug