Item response theory as a feature selection and interpretation tool in the context of machine learning,Medical & Biological Engineering & Computing

当前位置： X-MOL 学术 › Med. Biol. Eng. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Item response theory as a feature selection and interpretation tool in the context of machine learning
Medical & Biological Engineering & Computing ( IF 3.2 ) Pub Date : 2021-02-03 , DOI: 10.1007/s11517-020-02301-x
Adrienne S Kline _{1,

2,

3} , Theresa J B Kline ₄ , Joon Lee _{3,

5,

6}

Affiliation

Optimizing the number and utility of features to use in a classification analysis has been the subject of many research studies. Most current models use end-classifications as part of the feature reduction process, leading to circularity in the methodology. The approach demonstrated in the present research uses item response theory (IRT) to select features independent of the end-classification results without the biased accuracies that this circularity engenders. Dichotomous and polytomous IRT models were used to analyze 30 histological breast cancer features from 569 patients using the Wisconsin Diagnostic Breast Cancer data set. Based on their characteristics, three features were selected for use in a machine learning classifier. For comparison purposes, two machine learning–based feature selection protocols were run—recursive feature elimination (RFE) and ridge regression—and the three features selected from these analyses were also used in the subsequent learning classifier. Classification results demonstrated that all three selection processes performed comparably. The non-biased nature of the IRT protocol and information provided about the specific characteristics of the features as to why they are of use in classification help to shed light on understanding which attributes of features make them suitable for use in a machine learning context.

Graphical abstract

中文翻译：

项目反应理论作为机器学习背景下的特征选择和解释工具

优化用于分类分析的特征的数量和效用一直是许多研究的主题。大多数当前模型使用末端分类作为特征减少过程的一部分，从而导致方法的循环。本研究中展示的方法使用项目响应理论 (IRT) 来选择独立于最终分类结果的特征，而没有这种循环产生的偏差准确度。二分法和多分法 IRT 模型用于使用威斯康星州诊断性乳腺癌数据集分析来自 569 名患者的 30 种组织学乳腺癌特征。根据它们的特性，选择了三个特征用于机器学习分类器。为了比较，运行了两个基于机器学习的特征选择协议——递归特征消除 (RFE) 和岭回归——并且从这些分析中选择的三个特征也用于后续的学习分类器。分类结果表明，所有三个选择过程的表现都相当。IRT 协议的无偏见性质以及提供的关于特征的特定特征的信息，以及它们为何用于分类，有助于阐明特征的哪些属性使它们适合在机器学习环境中使用。

图形概要

更新日期：2021-02-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>