Sparse hierarchical regression with polynomials,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sparse hierarchical regression with polynomials
Machine Learning ( IF 4.3 ) Pub Date : 2020-01-24 , DOI: 10.1007/s10994-020-05868-6
Dimitris Bertsimas , Bart Van Parys

We present a novel method for sparse polynomial regression. We are interested in that degree r polynomial which depends on at most k inputs, counting at most $$\ell$$ ℓ monomial terms, and minimizes the sum of the squares of its prediction errors. Such highly structured sparse regression was denoted by Bach (Advances in neural information processing systems, pp 105–112, 2009) as sparse hierarchical regression in the context of kernel learning. Hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We propose an efficient two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve the remaining reduced hierarchical $$(k, \ell )$$ ( k , ℓ ) -sparse problem exactly. The ability of our method to identify all k relevant inputs and all $$\ell$$ ℓ monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. Hierarchical sparsity can retain the flexibility of general nonparametric methods such as nearest neighbors or regression trees ( CART ), without sacrificing much statistical power. The presented work hence fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with $$n\approx 10{,}000$$ n ≈ 10 , 000 observations for input dimension $$p\approx 1000$$ p ≈ 1000 .

中文翻译：

多项式稀疏分层回归

我们提出了一种稀疏多项式回归的新方法。我们感兴趣的是 r 次多项式，它取决于至多 k 个输入，最多计算 $$\ell$$ ℓ 个单项式项，并最小化其预测误差的平方和。这种高度结构化的稀疏回归由 Bach（神经信息处理系统的进展，第 105-112 页，2009 年）表示为内核学习背景下的稀疏层次回归。分层稀疏规范与现代大数据设置非常吻合，其中许多输入与预测目的无关，并且需要控制回归器的功能复杂性以避免过度拟合。我们针对这种分层稀疏回归问题提出了一种有效的两步方法。首先，我们使用极快的输入排序启发式方法丢弃不相关的输入。其次，我们利用现代切割平面方法进行整数优化来精确解决剩余的减少分层 $$(k, \ell )$$ ( k , ℓ ) -sparse 问题。我们的方法识别所有 k 相关输入和所有 $$\ell$$ ℓ 单项式项的能力凭经验显示经历了相变。至关重要的是，同样的转变也体现在我们拒绝所有不相关的特征和单项式的能力上。在我们的方法在统计上很强大的情况下，它的计算复杂性有趣地与基于套索的启发式算法相当。分层稀疏性可以保留一般非参数方法（例如最近邻或回归树 (CART)）的灵活性，而不会牺牲太多统计能力。因此，所提出的工作填补了在高维设置中缺乏强大的训练有素的非线性稀疏回归方法的空白。我们的方法凭经验证明可以缩放到回归问题，其中 $$n\approx 10{,}000$$ n ≈ 10 , 000 个输入维度的观察值 $$p\approx 1000$$ p ≈ 1000 。

更新日期：2020-01-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11