当前位置: X-MOL 学术Stat. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiple scaled contaminated normal distribution and its application in clustering
Statistical Modelling ( IF 1 ) Pub Date : 2019-12-09 , DOI: 10.1177/1471082x19890935
Antonio Punzo 1 , Cristina Tortora 2
Affiliation  

The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers, referred to as "bad" points. The MCN can also automatically detect bad points. The price of these advantages is two additional parameters, both with specific and useful interpretations: proportion of good observations and degree of contamination. However, points may be bad in some dimensions but good in others. The use of an overall proportion of good observations and of an overall degree of contamination is limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution with a proportion of good observations and a degree of contamination for each dimension. Once the model is fitted, each observation has a posterior probability of being good with respect to each dimension. Thanks to this probability, we have a method for simultaneous directional robust estimation of the parameters of the MN distribution based on down-weighting and for the automatic directional detection of bad points by means of maximum a posteriori probabilities. The term "directional" is added to specify that the method works separately for each dimension. Mixtures of MSCN distributions are also proposed as an application of the proposed model for robust clustering. An extension of the EM algorithm is used for parameter estimation based on the maximum likelihood approach. Real and simulated data are used to show the usefulness of our mixture with respect to well-established mixtures of symmetric distributions with heavy tails.

中文翻译:

多尺度污染正态分布及其在聚类中的应用

多元污染正态 (MCN) 分布代表多元正态 (MN) 分布的简单重尾概括,以在存在轻度异常值(称为“坏”点)的情况下对椭圆轮廓散点进行建模。MCN 还可以自动检测坏点。这些优势的代价是两个额外的参数,都具有特定且有用的解释:良好观察的比例和污染程度。然而,点在某些维度上可能是坏的,但在其他维度上可能是好的。良好观察的总体比例和总体污染程度的使用是有限的。为了克服这个限制,我们提出了一个多尺度污染正态 (MSCN) 分布,每个维度都有一定比例的良好观察和一定程度的污染。一旦模型被拟合,每个观测值在每个维度上都有一个良好的后验概率。由于这个概率,我们有一种方法可以同时定向鲁棒估计 MN 分布的参数,该方法基于向下加权,并通过最大后验概率自动定向检测坏点。添加术语“定向”以指定该方法对每个维度单独工作。还提出了 MSCN 分布的混合,作为所提出的稳健聚类模型的应用。EM 算法的扩展用于基于最大似然法的参数估计。
更新日期:2019-12-09
down
wechat
bug