An Efficient and Effective Model to Handle Missing Data in Classification,BioMed Research International

当前位置： X-MOL 学术 › BioMed Res. Int. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Efficient and Effective Model to Handle Missing Data in Classification
BioMed Research International ( IF 2.6 ) Pub Date : 2020-11-25 , DOI: 10.1155/2020/8810143
Kamran Mehrabani-Zeinabad ₁ , Marziyeh Doostfatemeh ₁ , Seyyed Mohammad Taghi Ayatollahi ₁

Affiliation

Missing data is one of the most important causes in reduction of classification accuracy. Many real datasets suffer from missing values, especially in medical sciences. Imputation is a common way to deal with incomplete datasets. There are various imputation methods that can be applied, and the choice of the best method depends on the dataset conditions such as sample size, missing percent, and missing mechanism. Therefore, the better solution is to classify incomplete datasets without imputation and without any loss of information. The structure of the “Bayesian additive regression trees” (BART) model is improved with the “Missingness Incorporated in Attributes” approach to solve its inefficiency in handling the missingness problem. Implementation of MIA-within-BART is named “BART.m”. As the abilities of BART.m are not investigated in classification of incomplete datasets, this simulation-based study aimed to provide such resource. The results indicate that BART.m can be used even for datasets with 90 missing present and more importantly, it diagnoses the irrelevant variables and removes them by its own. BART.m outperforms common models for classification with incomplete data, according to accuracy and computational time. Based on the revealed properties, it can be said that BART.m is a high accuracy model in classification of incomplete datasets which avoids any assumptions and preprocess steps.

中文翻译：

一种高效有效的分类中缺失数据处理模型

数据丢失是降低分类准确性的最重要原因之一。许多真实的数据集都有缺失值的困扰，特别是在医学领域。插补是处理不完整数据集的常用方法。可以采用多种插补方法，最佳方法的选择取决于数据集条件，例如样本大小，缺失百分比和缺失机制。因此，更好的解决方案是对不完整的数据集进行分类而不进行插补，也不会丢失任何信息。“缺少属性的合并”改进了“贝叶斯加性回归树”（BART）模型的结构，以解决其处理缺失问题的效率低下的问题。MIA-in-BART的实现称为“ BART.m”。作为BART的能力。不对不完整数据集的分类进行调查，该基于模拟的研究旨在提供此类资源。结果表明，即使存在90个缺失的数据集，BART.m也可以使用，更重要的是，它可以诊断不相关的变量并自行删除它们。根据准确度和计算时间的不同，BART.m优于用于不完整数据分类的通用模型。根据显示的属性，可以说BART.m是不完整数据集分类的高精度模型，它避免了任何假设和预处理步骤。根据准确度和计算时间的不同，BART.m优于用于不完整数据分类的通用模型。根据显示的属性，可以说BART.m是不完整数据集分类的高精度模型，它避免了任何假设和预处理步骤。根据准确度和计算时间的不同，BART.m优于用于不完整数据分类的通用模型。根据显示的属性，可以说BART.m是不完整数据集分类的高精度模型，它避免了任何假设和预处理步骤。

更新日期：2020-11-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11