当前位置: X-MOL 学术Commun. Stat. Theory Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
In defense of LASSO
Communications in Statistics - Theory and Methods ( IF 0.8 ) Pub Date : 2020-07-07 , DOI: 10.1080/03610926.2020.1788080
Chi Tim Ng 1 , Woojoo Lee 2 , Youngjo Lee 3
Affiliation  

Abstract

Although LASSO has been criticized for selecting too many covariates, it is illustrated in this paper that the bigger model chosen by LASSO method is suitable for exploratory research aiming at identifying all potential causes for further scientific investigation. Up to now, all criticisms assume that the covariates are observed without measurement errors, which is not likely to be true in many practical situations. Under measurement errors, the meaning of “relevant covariates” can be ambiguous. In such a situation, some covariates without an association with the response can be “potentially relevant”. The crucial point is that “relevant” and “potentially relevant” covariates cannot be distinguished based on the observed data in the presence of measurement errors. To avoid misinterpretation, both should be included in the model. This means that a bigger model is preferred. To understand the subset of covariates that should be included, a factor model of the covariates is introduced. Furthermore, new consistency theory is established under conditions weaker than those in Meinshausen and Bühlmann to cope with the situations where the preferred subset is not the same as the true model.



中文翻译:

为 LASSO 辩护

摘要

尽管 LASSO 因选择过多的协变量而受到批评,但本文说明 LASSO 方法选择的较大模型适用于旨在确定所有潜在原因以进行进一步科学研究的探索性研究。到目前为止,所有批评都假设协变量是在没有测量误差的情况下观察到的,这在许多实际情况下不太可能是正确的。在测量误差下,“相关协变量”的含义可能不明确。在这种情况下,一些与响应没有关联的协变量可能是“潜在相关的”。关键点是,在存在测量误差的情况下,无法根据观察到的数据区分“相关”和“潜在相关”协变量。为避免误解,两者都应包含在模型中。这意味着首选更大的模型。为了理解应该包含的协变量子集,引入了协变量的因子模型。此外,新的一致性理论是在弱于 Meinshausen 和 Bühlmann 的条件下建立的,以应对首选子集与真实模型不同的情况。

更新日期:2020-07-07
down
wechat
bug