Regularization and variable selection in Heckman selection model,Statistical Papers

当前位置： X-MOL 学术 › Stat. Pap. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Regularization and variable selection in Heckman selection model
Statistical Papers ( IF 1.2 ) Pub Date : 2021-06-16 , DOI: 10.1007/s00362-021-01246-z
Emmanuel O. Ogundimu

Sample selection arises when the outcome of interest is partially observed in a study. A common challenge is the requirement for exclusion restrictions. That is, some of the covariates affecting missingness mechanism do not affect the outcome. The drive to establish this requirement often leads to the inclusion of irrelevant variables in the model. A suboptimal solution is the use of classical variable selection criteria such as AIC and BIC, and traditional variable selection procedures such as stepwise selection. These methods are unstable when there is limited expert knowledge about the variables to include in the model. To address this, we propose the use of adaptive Lasso for variable selection and parameter estimation in both the selection and outcome submodels simultaneously in the absence of exclusion restrictions. By using the maximum likelihood estimator of the sample selection model, we constructed a loss function similar to the least squares regression problem up to a constant, and minimized its penalized version using an efficient algorithm. We show that the estimator, with proper choice of regularization parameter, is consistent and possesses the oracle properties. The method is compared to Lasso and adaptively weighted \(L_{1}\) penalized Two-step method. We applied the methods to the well-known Ambulatory Expenditure Data.

中文翻译：

Heckman 选择模型中的正则化和变量选择

当研究中部分观察到感兴趣的结果时，就会选择样本。一个常见的挑战是排除限制的要求。也就是说，一些影响缺失机制的协变量不会影响结果。建立此要求的动力通常会导致模型中包含不相关的变量。一个次优的解决方案是使用经典的变量选择标准，如 AIC 和 BIC，以及传统的变量选择程序，如逐步选择。当关于要包含在模型中的变量的专业知识有限时，这些方法是不稳定的。为了解决这个问题，我们建议在没有排除限制的情况下，同时在选择和结果子模型中使用自适应套索进行变量选择和参数估计。通过使用样本选择模型的最大似然估计量，我们构建了一个类似于最小二乘回归问题的损失函数，直到一个常数，并使用有效的算法将其惩罚版本最小化。我们展示了估算器，具有正确选择正则化参数，是一致的，并拥有Oracle属性。该方法与Lasso进行比较并自适应加权\(L_{1}\)惩罚两步法。我们将这些方法应用于众所周知的动态支出数据。

更新日期：2021-06-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11