Seemingly unrelated clusterwise linear regression for contaminated data,Statistical Papers

当前位置： X-MOL 学术 › Stat. Pap. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Seemingly unrelated clusterwise linear regression for contaminated data
Statistical Papers ( IF 1.2 ) Pub Date : 2022-08-06 , DOI: 10.1007/s00362-022-01344-6
Gabriele Perrone , Gabriele Soffritti

Clusterwise regression is an approach to regression analysis based on finite mixtures which is generally employed when sample observations come from a population composed of several unknown sub-populations. Whenever the response is continuous, Gaussian clusterwise linear regression models are usually employed. Such models have been recently robustified with respect to the possible presence of mild outliers in the sub-populations. However, in some fields of research, especially in the modelling of multivariate economic data or data from the social sciences, there may be prior information on the specific covariates to be considered in the linear term employed in the prediction of a certain response. As a consequence, covariates may not be the same for all responses. Thus, a novel class of multivariate Gaussian linear clusterwise regression models is proposed. This class provides an extension to mixture-based regression analysis for modelling multivariate and correlated responses in the presence of mild outliers that let the researcher free to use a different vector of covariates for each response. Details about the model identification and maximum likelihood estimation via an expectation-conditional maximisation algorithm are given. The performance of the new models is studied by simulation in comparison with other clusterwise linear regression models. A comparative evaluation of their effectiveness and usefulness is provided through the analysis of a real dataset.

中文翻译：

污染数据的看似无关的聚类线性回归

聚类回归是一种基于有限混合的回归分析方法，通常在样本观察来自由几个未知子群体组成的群体时采用。当响应是连续的时，通常使用高斯聚类线性回归模型。此类模型最近已针对亚群中可能存在的轻度异常值进行了强化。然而，在某些研究领域中，特别是在多元经济数据或社会科学数据的建模中，可能存在关于特定协变量的先验信息，这些信息要考虑在用于预测某个响应的线性项中。因此，所有响应的协变量可能并不相同。因此，提出了一类新的多元高斯线性聚类回归模型。此类为基于混合的回归分析提供了扩展，用于在存在轻度异常值的情况下对多变量和相关响应进行建模，从而使研究人员可以自由地为每个响应使用不同的协变量向量。给出了通过期望条件最大化算法进行模型识别和最大似然估计的详细信息。与其他聚类线性回归模型相比，通过模拟研究了新模型的性能。通过对真实数据集的分析，对它们的有效性和有用性进行了比较评估。此类为基于混合的回归分析提供了扩展，用于在存在轻度异常值的情况下对多变量和相关响应进行建模，从而使研究人员可以自由地为每个响应使用不同的协变量向量。给出了通过期望条件最大化算法进行模型识别和最大似然估计的详细信息。与其他聚类线性回归模型相比，通过模拟研究了新模型的性能。通过对真实数据集的分析，对它们的有效性和有用性进行了比较评估。此类为基于混合的回归分析提供了扩展，用于在存在轻度异常值的情况下对多变量和相关响应进行建模，从而使研究人员可以自由地为每个响应使用不同的协变量向量。给出了通过期望条件最大化算法进行模型识别和最大似然估计的详细信息。与其他聚类线性回归模型相比，通过模拟研究了新模型的性能。通过对真实数据集的分析，对它们的有效性和有用性进行了比较评估。与其他聚类线性回归模型相比，通过模拟研究了新模型的性能。通过对真实数据集的分析，对它们的有效性和有用性进行了比较评估。与其他聚类线性回归模型相比，通过模拟研究了新模型的性能。通过对真实数据集的分析，对它们的有效性和有用性进行了比较评估。

更新日期：2022-08-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11