当前位置: X-MOL 学术J. R. Stat. Soc. Ser. C Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian criterion-based variable selection
The Journal of the Royal Statistical Society: Series C (Applied Statistics) ( IF 1.0 ) Pub Date : 2021-04-27 , DOI: 10.1111/rssc.12488
Arnab Kumar Maity 1 , Sanjib Basu 2 , Santu Ghosh 3
Affiliation  

Bayesian approaches for criterion based selection include the marginal likelihood based highest posterior model (HPM) and the deviance information criterion (DIC). The DIC is popular in practice as it can often be estimated from sampling-based methods with relative ease and DIC is readily available in various Bayesian software. We find that sensitivity of DIC-based selection can be high, in the range of 90–100%. However, correct selection by DIC can be in the range of 0–2%. These performances persist consistently with increase in sample size. We establish that both marginal likelihood and DIC asymptotically disfavour under-fitted models, explaining the high sensitivities of both criteria. However, mis-selection probability of DIC remains bounded below by a positive constant in linear models with g-priors whereas mis-selection probability by marginal likelihood converges to 0 under certain conditions. A consequence of our results is that not only the DIC cannot asymptotically differentiate between the data-generating and an over-fitted model, but, in fact, it cannot asymptotically differentiate between two over-fitted models as well. We illustrate these results in multiple simulation studies and in a biomarker selection problem on cancer cachexia of non-small cell lung cancer patients. We further study the performances of HPM and DIC in generalized linear model as practitioners often choose to use DIC that is readily available in software in such non-conjugate settings.

中文翻译:


基于贝叶斯准则的变量选择



基于标准的选择的贝叶斯方法包括基于边际似然的最高后验模型 (HPM) 和偏差信息标准 (DIC)。 DIC 在实践中很受欢迎,因为它通常可以通过基于采样的方法相对轻松地进行估计,并且 DIC 在各种贝叶斯软件中很容易获得。我们发现基于 DIC 的选择的敏感性很高,在 90-100% 的范围内。然而,DIC 的正确选择范围可以是 0-2%。随着样本量的增加,这些性能始终保持不变。我们确定边际似然和 DIC 都渐近地不利于欠拟合模型,这解释了这两个标准的高敏感性。然而,在具有g先验的线性模型中,DIC 的误选概率仍然受到正常数的限制,而边际似然的误选概率在某些条件下收敛于 0。我们的结果的结果是,DIC 不仅不能渐近地区分数据生成模型和过拟合模型,而且事实上,它也不能渐近地区分两个过拟合模型。我们在多项模拟研究和非小细胞肺癌患者癌症恶病质的生物标志物选择问题中说明了这些结果。我们进一步研究 HPM 和 DIC 在广义线性模型中的性能,因为从业者经常选择使用 DIC,DIC 在此类非共轭设置的软件中很容易获得。
更新日期:2021-04-27
down
wechat
bug