Adaptive Bayesian SLOPE: Model Selection With Incomplete Data,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptive Bayesian SLOPE: Model Selection With Incomplete Data
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2021-10-14 , DOI: 10.1080/10618600.2021.1963263
Wei Jiang ₁ , Małgorzata Bogdan ₂ , Julie Josse ₁ , Szymon Majewski ₃ , Błażej Miasojedow ₄ , Veronika Ročková ₅ ,

Affiliation

Abstract

We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE with missing values—which effectively combines SLOPE (sorted l₁ regularization) with the spike-and-slab LASSO (SSL) and is accompanied by an efficient stochastic approximation of expected maximization (SAEM) algorithm to handle missing data. Similarly as in SSL, the regression coefficients are regarded as arising from a hierarchical model consisting of two groups: the spike for the inactive and the slab for the active. However, instead of assigning independent spike and slab Laplace priors for each covariate, here we deploy a joint SLOPE “spike-and-slab” prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation while handling missing data. Through extensive simulations, we demonstrate satisfactory performance in terms of power, false discovery rate (FDR) and estimation bias under a wide range of scenarios including complete data and existence of missingness. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent severe trauma, where we show competitive performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into open source R programs for public use. Supplemental files for this article are available online.

中文翻译：

自适应贝叶斯斜率：不完整数据的模型选择

摘要

我们考虑在协变量中缺少观测值的高维设置中的变量选择问题。为了解决这个相对未被充分研究的问题，我们提出了一种新的协同程序——具有缺失值的自适应贝叶斯 SLOPE——它有效地结合了 SLOPE（排序l ₁正则化）与spike-and-slab LASSO（SSL）并伴随着有效的预期最大化随机近似（SAEM）算法来处理丢失的数据。与 SSL 类似，回归系数被认为是由包含两组的分层模型产生的：非活动的尖峰和活动的平板。然而，我们没有为每个协变量分配独立的尖峰和平板拉普拉斯先验，而是部署了一个联合 SLOPE “尖峰和平板”先验，它考虑了系数大小的排序，以控制错误发现。我们将我们的方法置于贝叶斯框架内，该框架允许在处理缺失数据的同时进行变量选择和参数估计。通过广泛的模拟，我们在功率方面展示了令人满意的性能，在包括完整数据和存在缺失在内的广泛场景下的错误发现率（FDR）和估计偏差。最后，我们分析了一个真实的数据集，该数据集由来自巴黎医院的遭受严重创伤的患者组成，我们在预测血小板水平方面表现出竞争力。我们的方法已在 C++ 中实现并封装到开源 R 程序中供公众使用。本文的补充文件可在线获取。我们的方法已在 C++ 中实现并封装到开源 R 程序中供公众使用。本文的补充文件可在线获取。我们的方法已在 C++ 中实现并封装到开源 R 程序中供公众使用。本文的补充文件可在线获取。

更新日期：2021-10-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11