Bayesian model selection in the M-open setting — Approximate posterior inference and subsampling for efficient large-scale leave-one-out cross-validation via the difference estimator,Journal of Mathematical Psychology

当前位置： X-MOL 学术 › J. Math. Psychol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bayesian model selection in the M-open setting — Approximate posterior inference and subsampling for efficient large-scale leave-one-out cross-validation via the difference estimator
Journal of Mathematical Psychology ( IF 2.2 ) Pub Date : 2021-02-01 , DOI: 10.1016/j.jmp.2020.102474
Riko Kelter

Abstract Comparison of competing statistical models is an essential part of psychological research. From a Bayesian perspective, various approaches to model comparison and selection have been proposed in the literature. However, the applicability of these approaches depends on the assumptions about the model space M . Also, traditional methods like leave-one-out cross-validation (LOO-CV) estimate the expected log predictive density (ELPD) of a model to investigate how the model generalises out-of-sample, and quickly become computationally inefficient when sample size becomes large. Here, a tutorial on Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV) is provided, which is computationally more efficient. It is shown how Bayesian model selection can be scaled efficiently for big data via PSIS-LOO-CV in combination with approximate posterior inference and probability-proportional-to-size subsampling. First, several model views and the available Bayesian model comparison methods in each are discussed. The Bayesian logistic regression model is then used as a running example to show how to apply the method in practice, and demonstrate that it provides similarly accurate ELPD estimates like LOO-CV or information criteria. Subsequently, the power and exponential law models relating reaction times to practice are used to demonstrate the approach with more complex models. Guidance is provided how to compare competing models based on the ELPD estimates and how to conduct posterior predictive checks to safeguard against overconfidence in one of the models under consideration. The intended audience are researchers who practice mathematical modelling and comparison, possibly with large datasets, and who are well acquainted to Bayesian statistics.

中文翻译：

M-open 设置中的贝叶斯模型选择 - 近似后验推理和子采样，通过差异估计器进行有效的大规模留一法交叉验证

摘要竞争统计模型的比较是心理学研究的重要组成部分。从贝叶斯的角度来看，文献中已经提出了各种模型比较和选择的方法。然而，这些方法的适用性取决于关于模型空间 M 的假设。此外，留一法交叉验证 (LOO-CV) 等传统方法估计模型的预期对数预测密度 (ELPD) 以研究模型如何泛化样本外，并在样本大小时迅速变得计算效率低下变大。这里提供了一个关于帕累托平滑重要性采样留一法交叉验证 (PSIS-LOO-CV) 的教程，它在计算上更高效。展示了如何通过 PSIS-LOO-CV 结合近似后验推理和概率与大小成比例的子采样，为大数据有效地扩展贝叶斯模型选择。首先，讨论了几个模型视图和每个视图中可用的贝叶斯模型比较方法。然后使用贝叶斯逻辑回归模型作为运行示例来展示如何在实践中应用该方法，并证明它提供了类似准确的 ELPD 估计，如 LOO-CV 或信息标准。随后，将反应时间与实践相关的幂和指数定律模型用于演示具有更复杂模型的方法。提供了如何根据 ELPD 估计比较竞争模型以及如何进行后验预测检查以防止对所考虑的模型之一过度自信的指导。目标受众是从事数学建模和比较的研究人员，可能使用大型数据集，并且非常熟悉贝叶斯统计。

更新日期：2021-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11