当前位置: X-MOL 学术Technometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-Dimensional Cost-constrained Regression Via Nonconvex Optimization
Technometrics ( IF 2.5 ) Pub Date : 2021-05-04 , DOI: 10.1080/00401706.2021.1905071
Guan Yu 1 , Haoda Fu 2 , Yufeng Liu 3
Affiliation  

Abstract

Budget constraints become an important consideration in modern predictive modeling due to the high cost of collecting certain predictors. This motivates us to develop cost-constrained predictive modeling methods. In this article, we study a new high-dimensional cost-constrained linear regression problem, that is, we aim to find the cost-constrained regression model with the smallest expected prediction error among all models satisfying a budget constraint. The nonconvex budget constraint makes this problem NP-hard. In order to estimate the regression coefficient vector of the cost-constrained regression model, we propose a new discrete first-order continuous optimization method. In particular, our method delivers a series of estimates of the regression coefficient vector by solving a sequence of 0-1 knapsack problems. Theoretically, we prove that the series of the estimates generated by our iterative algorithm converge to a first-order stationary point, which can be a globally optimal solution under some conditions. Furthermore, we study some extensions of our method that can be used for general statistical learning problems and problems with groups of variables. Numerical studies using simulated datasets and a real dataset from a diabetes study indicate that our proposed method can solve problems of fairly high dimensions with promising performance.



中文翻译:

通过非凸优化的高维成本约束回归

摘要

由于收集某些预测变量的成本很高,预算约束成为现代预测建模中的一个重要考虑因素。这促使我们开发成本受限的预测建模方法。在本文中,我们研究了一个新的高维成本约束线性回归问题,即我们的目标是在所有满足预算约束的模型中找到期望预测误差最小的成本约束回归模型。非凸预算约束使这个问题成为 NP-hard。为了估计成本约束回归模型的回归系数向量,我们提出了一种新的离散一阶连续优化方法。特别是,我们的方法通过解决一系列 0-1 背包问题来提供回归系数向量的一系列估计。理论上,我们证明了我们的迭代算法生成的一系列估计收敛到一阶驻点,这在某些条件下可以是全局最优解。此外,我们研究了一些可用于一般统计学习问题和变量组问题的方法的扩展。使用模拟数据集和来自糖尿病研究的真实数据集的数值研究表明,我们提出的方法可以解决相当高维度的问题,并具有良好的性能。

更新日期:2021-05-04
down
wechat
bug