Bayesian Distance Weighted Discrimination,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bayesian Distance Weighted Discrimination
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-05-26 , DOI: 10.1080/10618600.2022.2069778
Eric F Lock ₁

Affiliation

Abstract

Distance weighted discrimination (DWD) is a linear discrimination method that is particularly well-suited for classification tasks with high-dimensional data. The DWD coefficients minimize an intuitive objective function, which can solved efficiently using state-of-the-art optimization techniques. However, DWD has not yet been cast into a model-based framework for statistical inference. In this article we show that DWD identifies the mode of a proper Bayesian posterior distribution, that results from a particular link function for the class probabilities and a shrinkage-inducing proper prior distribution on the coefficients. We describe a relatively efficient Markov chain Monte Carlo (MCMC) algorithm to simulate from the true posterior under this Bayesian framework. We show that the posterior is asymptotically normal and derive the mean and covariance matrix of its limiting distribution. Through several simulation studies and an application to breast cancer genomics we demonstrate how the Bayesian approach to DWD can be used to (a) compute well-calibrated posterior class probabilities, (b) assess uncertainty in the DWD coefficients and resulting sample scores, (c) improve power via semisupervised analysis when not all class labels are available, and (d) automatically determine a penalty tuning parameter within the model-based framework. R code to perform Bayesian DWD is available at https://github.com/lockEF/BayesianDWD. Supplementary materials for this article are available online.

中文翻译：

贝叶斯距离加权歧视

摘要

距离加权判别 (DWD) 是一种线性判别方法，特别适用于高维数据的分类任务。DWD 系数最小化了一个直观的目标函数，可以使用最先进的优化技术有效地求解。然而，DWD 尚未被纳入基于模型的统计推断框架。在本文中，我们展示了 DWD 识别适当的贝叶斯后验分布的模式，这是由类概率的特定链接函数和系数上的收缩诱导适当先验分布产生的。我们描述了一种相对有效的马尔可夫链蒙特卡洛 (MCMC) 算法，以在此贝叶斯框架下从真实后验进行模拟。我们证明后验是渐近正态的，并推导出其极限分布的均值和协方差矩阵。通过多项模拟研究和对乳腺癌基因组学的应用，我们展示了 DWD 的贝叶斯方法如何用于 (a) 计算经过良好校准的后验类概率，(b) 评估 DWD 系数和所得样本分数的不确定性，(c ) 当并非所有类标签都可用时，通过半监督分析提高功效，以及 (d) 在基于模型的框架内自动确定惩罚调整参数。执行贝叶斯 DWD 的 R 代码可在 https://github.com/lockEF/BayesianDWD 获得。本文的补充材料可在线获取。通过多项模拟研究和对乳腺癌基因组学的应用，我们展示了 DWD 的贝叶斯方法如何用于 (a) 计算经过良好校准的后验类概率，(b) 评估 DWD 系数和所得样本分数的不确定性，(c ) 当并非所有类标签都可用时，通过半监督分析提高功效，以及 (d) 在基于模型的框架内自动确定惩罚调整参数。执行贝叶斯 DWD 的 R 代码可在 https://github.com/lockEF/BayesianDWD 获得。本文的补充材料可在线获取。通过多项模拟研究和对乳腺癌基因组学的应用，我们展示了 DWD 的贝叶斯方法如何用于 (a) 计算经过良好校准的后验类概率，(b) 评估 DWD 系数和所得样本分数的不确定性，(c ) 当并非所有类标签都可用时，通过半监督分析提高功效，以及 (d) 在基于模型的框架内自动确定惩罚调整参数。执行贝叶斯 DWD 的 R 代码可在 https://github.com/lockEF/BayesianDWD 获得。本文的补充材料可在线获取。(c) 当并非所有类别标签都可用时，通过半监督分析提高功效，以及 (d) 在基于模型的框架内自动确定惩罚调整参数。执行贝叶斯 DWD 的 R 代码可在 https://github.com/lockEF/BayesianDWD 获得。本文的补充材料可在线获取。(c) 当并非所有类别标签都可用时，通过半监督分析提高功效，以及 (d) 在基于模型的框架内自动确定惩罚调整参数。执行贝叶斯 DWD 的 R 代码可在 https://github.com/lockEF/BayesianDWD 获得。本文的补充材料可在线获取。

更新日期：2022-05-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11