Bayesian model selection for high-dimensional Ising models, with applications to educational data,Computational Statistics & Data Analysis

当前位置： X-MOL 学术 › Comput. Stat. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bayesian model selection for high-dimensional Ising models, with applications to educational data
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2021-07-27 , DOI: 10.1016/j.csda.2021.107325
Jaewoo Park _{1,

2} , Ick Hoon Jin _{1,

2} , Michael Schweinberger ₃

Affiliation

Doubly-intractable posterior distributions arise in many applications of statistics concerned with discrete and dependent data, including physics, spatial statistics, machine learning, the social sciences, and other fields. A specific example is psychometrics, which has adapted high-dimensional Ising models from machine learning, with a view to studying the interactions among binary item responses in educational assessments. To estimate high-dimensional Ising models from educational assessment data, $ℓ_{1}$ -penalized nodewise logistic regressions have been used. Theoretical results in high-dimensional statistics show that $ℓ_{1}$ -penalized nodewise logistic regressions can recover the true interaction structure with high probability, provided that certain assumptions are satisfied. Those assumptions are hard to verify in practice and may be violated, and quantifying the uncertainty about the estimated interaction structure and parameter estimators is challenging. We propose a Bayesian approach that helps quantify the uncertainty about the interaction structure and parameters without requiring strong assumptions, and can be applied to Ising models with thousands of parameters. We demonstrate the advantages of the proposed Bayesian approach compared with $ℓ_{1}$ -penalized nodewise logistic regressions by simulation studies and applications to small and large educational data sets with up to 2,485 parameters. Among other things, the simulation studies suggest that the Bayesian approach is more robust against model misspecification due to omitted covariates than $ℓ_{1}$ -penalized nodewise logistic regressions.

中文翻译：

高维 Ising 模型的贝叶斯模型选择，适用于教育数据

双重难以处理的后验分布出现在许多涉及离散和相关数据的统计应用中，包括物理学、空间统计、机器学习、社会科学和其他领域。一个具体的例子是心理测量学，它采用了机器学习中的高维 Ising 模型，以研究教育评估中二元项目反应之间的相互作用。从教育评估数据中估计高维 Ising 模型， $ℓ_{1}$ - 已使用惩罚节点逻辑回归。高维统计的理论结果表明， $ℓ_{1}$ -惩罚节点逻辑回归可以高概率恢复真实的交互结构，前提是满足某些假设。这些假设在实践中很难验证并且可能会被违反，并且量化估计的交互结构和参数估计量的不确定性是具有挑战性的。我们提出了一种贝叶斯方法，该方法有助于量化交互结构和参数的不确定性，而无需强假设，并且可以应用于具有数千个参数的 Ising 模型。我们证明了所提出的贝叶斯方法的优点与 $ℓ_{1}$ - 通过模拟研究和应用到具有多达 2,485 个参数的小型和大型教育数据集来惩罚节点逻辑回归。除其他外，模拟研究表明，贝叶斯方法对由于遗漏协变量而导致的模型错误指定比 $ℓ_{1}$ - 惩罚节点逻辑回归。

更新日期：2021-07-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11