Is a Classification Procedure Good Enough?—A Goodness-of-Fit Assessment Tool for Classification Learning,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Is a Classification Procedure Good Enough?—A Goodness-of-Fit Assessment Tool for Classification Learning
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2021-11-17 , DOI: 10.1080/01621459.2021.1979010
Jiawei Zhang ₁ , Jie Ding ₁ , Yuhong Yang ₁

Affiliation

Abstract

In recent years, many nontraditional classification methods, such as random forest, boosting, and neural network, have been widely used in applications. Their performance is typically measured in terms of classification accuracy. While the classification error rate and the like are important, they do not address a fundamental question: Is the classification method underfitted? To our best knowledge, there is no existing method that can assess the goodness of fit of a general classification procedure. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a methodology called BAGofT that splits the data into a training set and a validation set. First, the classification procedure to assess is applied to the training set, which is also used to adaptively find a data grouping that reveals the most severe regions of underfitting. Then, based on this grouping, we calculate a test statistic by comparing the estimated success probabilities and the actual observed responses from the validation set. The data splitting guarantees that the size of the test is controlled under the null hypothesis, and the power of the test goes to one as the sample size increases under the alternative hypothesis. For testing parametric classification models, the BAGofT has a broader scope than the existing methods since it is not restricted to specific parametric models (e.g., logistic regression). Extensive simulation studies show the utility of the BAGofT when assessing general classification procedures and its strengths over some existing methods when testing parametric classification models. Supplementary materials for this article are available online.

中文翻译：

分类程序足够好吗？——分类学习的拟合优度评估工具

摘要

近年来，许多非传统的分类方法，如随机森林、Boosting、神经网络等在应用中得到了广泛的应用。它们的性能通常是根据分类准确性来衡量的。虽然分类错误率等很重要，但它们没有解决一个基本问题：分类方法是否欠拟合？据我们所知，没有现有的方法可以评估一般分类程序的拟合优度。事实上，缺乏参数假设使得构建适当的测试变得具有挑战性。为了克服这个困难，我们提出了一种称为 BAGofT 的方法，它将数据分为训练集和验证集。首先，将评估的分类程序应用于训练集，它还用于自适应地查找揭示最严重欠拟合区域的数据分组。然后，基于此分组，我们通过比较估计的成功概率和验证集中实际观察到的响应来计算测试统计量。数据分割保证了在原假设下检验的大小受到控制，并且在备择假设下随着样本量的增加检验的功效变为 1。对于测试参数分类模型，BAGofT 比现有方法具有更广泛的范围，因为它不限于特定的参数模型（例如逻辑回归）。广泛的模拟研究表明，BAGofT 在评估一般分类程序时的实用性，以及在测试参数分类模型时相对于某些现有方法的优势。本文的补充材料可在线获取。

更新日期：2021-11-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>