当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Influence Diagnostics for High-Dimensional Lasso Regression
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2019-06-11 , DOI: 10.1080/10618600.2019.1598869
Bala Rajaratnam 1 , Steven Roberts 2 , Doug Sparks 1 , Honglin Yu 2
Affiliation  

Abstract The increased availability of high-dimensional data, and appeal of a “sparse” solution has made penalized likelihood methods commonplace. Arguably the most widely utilized of these methods is regularization, popularly known as the lasso. When the lasso is applied to high-dimensional data, observations are relatively few; thus, each observation can potentially have tremendous influence on model selection and inference. Hence, a natural question in this context is the identification and assessment of influential observations. We address this by extending the framework for assessing estimation influence in traditional linear regression, and demonstrate that it is equally, if not more, relevant for assessing model selection influence for high-dimensional lasso regression. Within this framework, we propose four new “deletion methods” for gauging the influence of an observation on lasso model selection: df-model, df-regpath, df-cvpath, and df-lambda. Asymptotic cut-offs for each measure, even when , are developed. We illustrate that in high-dimensional settings, individual observations can have a tremendous impact on lasso model selection. We demonstrate that application of our measures can help reveal relationships in high-dimensional real data that may otherwise remain hidden. Supplementary materials for this article are available online.

中文翻译:

高维套索回归的影响诊断

摘要 高维数据可用性的增加以及“稀疏”解决方案的吸引力使得惩罚似然方法变得司空见惯。可以说这些方法中使用最广泛的是正则化,俗称套索。当套索应用于高维数据时,观察相对较少;因此,每次观察都可能对模型选择和推理产生巨大影响。因此,在这种情况下,一个自然的问题是识别和评估有影响的观察结果。我们通过扩展评估传统线性回归中估计影响的框架来解决这个问题,并证明它与评估高维套索回归的模型选择影响同样相关,甚至更多。在这个框架内,我们提出了四种新的“删除方法”来衡量观察对套索模型选择的影响:df-model、df-regpath、df-cvpath 和 df-lambda。每个度量的渐近截止值,即使在 , 被开发。我们说明在高维设置中,个体观察会对套索模型选择产生巨大影响。我们证明了我们的措施的应用可以帮助揭示高维真实数据中可能隐藏的关系。本文的补充材料可在线获取。我们证明了我们的措施的应用可以帮助揭示高维真实数据中可能隐藏的关系。本文的补充材料可在线获取。我们证明了我们的措施的应用可以帮助揭示高维真实数据中可能隐藏的关系。本文的补充材料可在线获取。
更新日期:2019-06-11
down
wechat
bug