当前位置: X-MOL 学术Stat › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian nonparametric multiway regression for clustered binomial data
Stat ( IF 0.7 ) Pub Date : 2021-03-24 , DOI: 10.1002/sta4.378
Eric F Lock 1 , Dipankar Bandyopadhyay 2
Affiliation  

We introduce a Bayesian nonparametric regression model for data with multiway (tensor) structure, motivated by an application to periodontal disease (PD) data. Our outcome is the number of diseased sites measured over four different tooth types for each subject, with subject-specific covariates available as predictors. The outcomes are not well characterized by simple parametric models, so we use a nonparametric approach with a binomial likelihood wherein the latent probabilities are drawn from a mixture with an arbitrary number of components, analogous to a Dirichlet process. We use a flexible probit stick-breaking formulation for the component weights that allows for covariate dependence and clustering structure in the outcomes. The parameter space for this model is large and multiway: patients × tooth types × covariates × components. We reduce its effective dimensionality and account for the multiway structure, via low-rank assumptions. We illustrate how this can improve performance and simplify interpretation while still providing sufficient flexibility. We describe a general and efficient Gibbs sampling algorithm for posterior computation. The resulting fit to the PD data outperforms competitors and is interpretable and well calibrated. An interactive visual of the predictive model is available at the website (https://ericfrazerlock.com/toothdata/ToothDisplay.html), and the code is available at the GitHub (https://github.com/lockEF/NonparametricMultiway).

中文翻译:

聚类二项式数据的贝叶斯非参数多路回归

我们为具有多路(张量)结构的数据引入了贝叶斯非参数回归模型,其动机是应用到牙周病 (PD) 数据。我们的结果是针对每个受试者在四种不同牙齿类型上测量的患病部位的数量,并使用特定于受试者的协变量作为预测因子。简单的参数模型无法很好地表征结果,因此我们使用具有二项式似然性的非参数方法,其中潜在概率来自具有任意数量成分的混合物,类似于 Dirichlet 过程。我们对分量权重使用灵活的概率断棒公式,允许结果中的协变量依赖性和聚类结构。该模型的参数空间很大并且是多向的:患者 × 牙齿类型 × 协变量 × 分量。我们通过低秩假设降低其有效维数并考虑多路结构。我们说明了这如何提高性能并简化解释,同时仍提供足够的灵活性。我们描述了一种用于后验计算的通用且高效的 Gibbs 采样算法。由此产生的对 PD 数据的拟合优于竞争对手,并且可解释和校准良好。预测模型的交互式可视化可在网站 (https://ericfrazerlock.com/toothdata/ToothDisplay.html) 上获得,代码可在 GitHub (https://github.com/lockEF/NonparametricMultiway) 上获得。
更新日期:2021-03-24
down
wechat
bug