当前位置: X-MOL 学术Commun. Math. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regularity Properties for Sparse Regression.
Communications in Mathematics and Statistics ( IF 0.9 ) Pub Date : 2016-03-14 , DOI: 10.1007/s40304-015-0078-6
Edgar Dobriban 1 , Jianqing Fan 2
Affiliation  

Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and \(\ell _q\) sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given dataset. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, \(\ell _q\) sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.

中文翻译:

稀疏回归的规律性属性。

统计和机器学习理论已开发出多种条件,可确保诸如Lasso或Dantzig选择器之类的流行估计量在高维稀疏回归中表现良好,包括受限制的特征值,兼容性和\(\ ell _q \)灵敏度属性。但是,这些条件的某些主要方面尚未得到很好的理解。例如,未知是否可以在任何给定的数据集上有效地检查这些条件。这是有问题的,因为它们是稀疏回归理论的核心。在这里,我们提供了严格的证明,证明这些条件很难检查。这表明该条件在计算上难以验证,并引发了有关其实际应用的一些问题。但是,通过采用平均情况的视角而不是最困难情况的NP硬度视图,我们可以证明特定条件\(\ ell _q \)灵敏度高,具有某些理想的特性。这种情况比其他情况更弱和更普遍。我们证明它在父母群体表现良好的模型中具有很高的概率,并且对某些数据处理步骤具有鲁棒性。这些结果是可取的,因为它们提供了有关条件以及更普遍的稀疏回归理论何时与高维相关观测数据的分析相关的指导。
更新日期:2016-03-14
down
wechat
bug