Adaptive covariate acquisition for minimizing total cost of classification,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptive covariate acquisition for minimizing total cost of classification
Machine Learning ( IF 4.3 ) Pub Date : 2021-04-18 , DOI: 10.1007/s10994-021-05958-z
Daniel Andrade , Yuzuru Okajima

In some applications, acquiring covariates comes at a cost which is not negligible. For example in the medical domain, in order to classify whether a patient has diabetes or not, measuring glucose tolerance can be expensive. Assuming that the cost of each covariate, and the cost of misclassification can be specified by the user, our goal is to minimize the (expected) total cost of classification, i.e. the cost of misclassification plus the cost of the acquired covariates. We formalize this optimization goal using the (conditional) Bayes risk and describe the optimal solution using a recursive procedure. Since the procedure is computationally infeasible, we consequently introduce two assumptions: (1) the optimal classifier can be represented by a generalized additive model, (2) the optimal sets of covariates are limited to a sequence of sets of increasing size. We show that under these two assumptions, a computationally efficient solution exists. Furthermore, on several medical datasets, we show that the proposed method achieves in most situations the lowest total costs when compared to various previous methods. Finally, we weaken the requirement on the user to specify all misclassification costs by allowing the user to specify the minimally acceptable recall (target recall). Our experiments confirm that the proposed method achieves the target recall while minimizing the false discovery rate and the covariate acquisition costs better than previous methods.

中文翻译：

自适应协变量获取，可将分类的总成本降至最低

在某些应用中，获取协变量的成本不可忽略。例如在医学领域，为了分类患者是否患有糖尿病，测量葡萄糖耐量可能是昂贵的。假设用户可以指定每个协变量的成本和错误分类的成本，我们的目标是使分类的（预期）总成本（即错误分类的成本加上所获取的协变量的成本）最小化。我们使用（条件）贝叶斯风险来形式化此优化目标，并使用递归过程描述最佳解决方案。由于该程序在计算上不可行，因此我们引入两个假设：（1）最佳分类器可以用广义加性模型表示，（2）最佳的协变量集限于一系列大小递增的序列。我们表明，在这两个假设下，存在计算有效的解决方案。此外，在几个医学数据集上，我们表明，与以前的各种方法相比，该方法在大多数情况下可实现最低的总成本。最后，我们通过允许用户指定最低可接受召回率（目标召回率）来减弱对用户指定所有误分类成本的要求。我们的实验证实，与以前的方法相比，该方法在实现目标召回率的同时，最大程度地降低了错误发现率，并且协变量获取成本更高。我们表明，与以前的各种方法相比，该方法在大多数情况下可实现最低的总成本。最后，我们通过允许用户指定最低可接受召回率（目标召回率）来减弱对用户指定所有误分类成本的要求。我们的实验证实，与以前的方法相比，所提出的方法可以实现目标召回率，同时最大程度地减少了错误发现率，并且协变量获取成本更高。我们表明，与以前的各种方法相比，该方法在大多数情况下可实现最低的总成本。最后，我们通过允许用户指定最低可接受召回率（目标召回率）来减弱对用户指定所有误分类成本的要求。我们的实验证实，与以前的方法相比，该方法在实现目标召回率的同时，最大程度地降低了错误发现率，并且协变量获取成本更高。

更新日期：2021-04-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11