当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient and robust estimation of regression and scale parameters, with outlier detection
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.csda.2020.107114
Alain Desgagné

Abstract Linear regression with normally distributed errors – including particular cases such as ANOVA, Student’s t -test or location–scale inference – is a widely used statistical procedure. In this case the ordinary least squares estimator possesses remarkable properties but is very sensitive to outliers. Several robust alternatives have been proposed, but there is still significant room for improvement. An original method of estimation is thus proposed, which offers high efficiency simultaneously in the absence and the presence of outliers, both for the estimation of the regression coefficients and the scale parameter. The approach first consists in broadening the normal assumption of the errors to a mixture of the normal and the filtered-log-Pareto (FLP), an original distribution designed to represent the outliers. The expectation–maximization (EM) algorithm is then adapted, which yields the N–FLP estimators of the regression coefficients, the scale parameter and the proportion of outliers, along with probabilities of each observation being an outlier. The performance of the N–FLP estimators is compared with the best alternatives in an extensive Monte Carlo simulation. It is shown that this method of estimation can also be used for a complete robust inference, including confidence intervals, hypothesis testing and model selection.

中文翻译:

回归和尺度参数的有效和稳健估计,具有异常值检测

摘要 具有正态分布误差的线性回归——包括特殊情况,如方差分析、学生 t 检验或位置尺度推断——是一种广泛使用的统计程序。在这种情况下,普通最小二乘估计量具有显着的特性,但对异常值非常敏感。已经提出了几种可靠的替代方案,但仍有很大的改进空间。因此提出了一种原始的估计方法,该方法在不存在和存在异常值的情况下同时提供高效率,用于估计回归系数和尺度参数。该方法首先包括将错误的正态假设扩大到正态和过滤对数帕累托 (FLP) 的混合,这是一种旨在表示异常值的原始分布。然后调整期望最大化 (EM) 算法,该算法产生回归系数、尺度参数和异常值比例的 N-FLP 估计量,以及每个观测值是异常值的概率。N-FLP 估计器的性能与广泛的 Monte Carlo 模拟中的最佳替代方案进行了比较。结果表明,这种估计方法也可用于完整的稳健推理,包括置信区间、假设检验和模型选择。
更新日期:2021-03-01
down
wechat
bug