当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2021-06-17 , DOI: 10.1007/s11634-021-00448-5
Francisco H. C. de Alencar , Christian E. Galarza , Larissa A. Matos , Victor H. Lachos

Finite mixture models have been widely used to model and analyze data from a heterogeneous populations. Moreover, data of this kind can be missing or subject to some upper and/or lower detection limits because of the constraints of experimental apparatuses. Another complication arises when measures of each population depart significantly from normality, such as asymmetric behavior. For such data structures, we propose a robust model for censored and/or missing data based on finite mixtures of multivariate skew-normal distributions. This approach allows us to model data with great flexibility, accommodating multimodality and skewness, simultaneously, depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM-type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the truncated multivariate skew-normal distributions. Furthermore, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed method. The proposed algorithm and method are implemented in the new R package CensMFM.



中文翻译:

使用多元偏态正态分布对删失数据和缺失数据进行有限混合建模

有限混合模型已广泛用于对来自异质种群的数据进行建模和分析。此外,由于实验设备的限制,此类数据可能会丢失或受到某些检测上限和/或下限的影响。当每个群体的测量值显着偏离正态时,就会出现另一种复杂情况,例如不对称行为。对于此类数据结构,我们提出了一个基于多元偏态正态分布的有限混合的删失数据和/或缺失数据的稳健模型。这种方法使我们能够以极大的灵活性对数据进行建模,同时适应多模态和偏度,具体取决于混合组件的结构。我们开发了一种分析简单但高效的 EM 型算法,用于对参数进行最大似然估计。该算法在 E 步具有封闭形式的表达式,这些表达式依赖于截断多元偏态正态分布的均值和方差的公式。此外,还提出了一种用于逼近估计量的渐近协方差矩阵的基于信息的通用方法。报告了从模拟数据集和真实数据集的分析中获得的结果,以证明所提出方法的有效性。提出的算法和方法在新的 报告了从模拟数据集和真实数据集的分析中获得的结果,以证明所提出方法的有效性。提出的算法和方法在新的 报告了从模拟数据集和真实数据集的分析中获得的结果,以证明所提出方法的有效性。提出的算法和方法在新的RCensMFM

更新日期:2021-06-18
down
wechat
bug