Model diagnostics in reduced-rank estimation,Statistics and Its Interface

当前位置： X-MOL 学术 › Stat. Interface › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model diagnostics in reduced-rank estimation
Statistics and Its Interface ( IF 0.3 ) Pub Date : 2016-01-01 , DOI: 10.4310/sii.2016.v9.n4.a7
Kun Chen ₁

Affiliation

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches.

中文翻译：

降秩估计中的模型诊断

降阶方法在高维多元分析中非常流行，用于同时进行降维和模型估计。然而，常用的降秩方法并不稳健，因为底层的降秩结构很容易被少数数据异常值扭曲。大数据问题中必然存在异常，而在某些应用程序中，异常本身可能是主要关注点。虽然由于潜在的掩蔽和淹没，朴素的残差分析通常不足以进行异常值检测，但稳健的降秩估计方法可能在计算上要求很高。在 Stein 的无偏风险估计框架下，我们提出了一组工具，包括杠杆分数和广义信息分数，以在大规模降阶估计中执行模型诊断和异常值检测。杠杆分数将所谓的模型自由度精确分解到观察水平，从而对许多常用信息标准进行精确分解；由此产生的数量被命名为观察的信息分数。所提出的信息评分方法提供了一种结合残差和杠杆分数进行异常检测的原则性方法。模拟研究证实，所提出的诊断工具运行良好。带有手写数字图像的模式识别示例和带有美国每月宏观经济数据的时间序列分析示例进一步证明了所提出方法的有效性。这导致了许多常用信息标准的精确分解；由此产生的数量被命名为观察的信息分数。所提出的信息评分方法提供了一种结合残差和杠杆分数进行异常检测的原则性方法。模拟研究证实，所提出的诊断工具运行良好。带有手写数字图像的模式识别示例和带有美国每月宏观经济数据的时间序列分析示例进一步证明了所提出方法的有效性。这导致了许多常用信息标准的精确分解；由此产生的数量被命名为观察的信息分数。所提出的信息评分方法提供了一种结合残差和杠杆分数进行异常检测的原则性方法。模拟研究证实，所提出的诊断工具运行良好。带有手写数字图像的模式识别示例和带有美国每月宏观经济数据的时间序列分析示例进一步证明了所提出方法的有效性。

更新日期：2016-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11