当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Default priors for the intercept parameter in logistic regressions
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2019-05-01 , DOI: 10.1016/j.csda.2018.10.014
Philip S Boonstra 1 , Ryan P Barbaro 2, 3 , Ananda Sen 1, 4
Affiliation  

In logistic regression, separation occurs when a linear combination of predictors perfectly discriminates the binary outcome. Because finite-valued maximum likelihood parameter estimates do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Classical studies of separation imply that efficiency in estimating regression coefficients may also depend upon the choice of intercept prior, yet relatively little focus has been given on whether and how to shrink the intercept parameter. Alternative prior distributions for the intercept are proposed that downweight implausibly extreme regions of the parameter space, rendering regression estimates that are less sensitive to separation. Through simulation and the analysis of exemplar datasets, differences across priors stratified by established statistics measuring the degree of separation are quantified. Relative to diffuse priors, these proposed priors generally yield more efficient estimation of the regression coefficients themselves when the data are nearly separated. They are equally efficient in non-separated datasets, making them suitable for default use. Modest differences were observed with respect to out-of-sample discrimination. These numerical studies also highlight the interplay between priors for the intercept and the regression coefficients: findings are more sensitive to the choice of intercept prior when using a weakly informative prior on the regression coefficients than an informative shrinkage prior.

中文翻译:

逻辑回归中截距参数的默认先验

在逻辑回归中,当预测变量的线性组合完美地区分二元结果时,就会发生分离。由于有限值最大似然参数估计在分离下不存在,因此具有回归系数信息收缩的贝叶斯回归提供了合适的替代方案。分离的经典研究表明,估计回归系数的效率也可能取决于截距先验的选择,但相对较少的关注点是是否以及如何缩小截距参数。提出了截距的替代先验分布,以降低参数空间中令人难以置信的极端区域的权重,从而呈现对分离不太敏感的回归估计。通过模拟和示例数据集的分析,通过测量分离程度的既定统计数据分层的先验差异被量化。相对于扩散先验,当数据几乎分离时,这些提出的先验通常会产生更有效的回归系数本身估计。它们在非分离数据集中同样有效,因此适合默认使用。在样本外歧视方面观察到了适度的差异。这些数值研究还强调了截距先验和回归系数之间的相互作用:当在回归系数上使用信息较弱的先验时,研究结果对截距先验的选择比信息收缩先验更敏感。
更新日期:2019-05-01
down
wechat
bug