当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel multi-objective genetic algorithm approach to address class imbalance for disease diagnosis
International Journal of Information Technology Pub Date : 2020-05-20 , DOI: 10.1007/s41870-020-00471-3
Anju Jain , Saroj Ratnoo , Dinesh Kumar

The adverse consequences of class imbalance problem are prevalent while performing classification for disease diagnosis. Such classifiers predict most of the examples of negative class (instances with non-diseased label) correctly but fail to make correct predictions for the positive class examples (instances with diseased label). Since misclassification costs can be very high for a sensitive field like disease diagnosis, addressing the class imbalance issue becomes of utmost important. A number of sampling techniques have been applied for balancing data. However, these techniques reduce the overall accuracy of classifier models. This is due to the fact that there is an issue of trade-off between sensitivity and specificity of such classifiers. This paper first proposes a GA-based undersampling technique with a weighted fitness function to determine the trade-off between sensitivity and specificity followed by a multi-objective genetic algorithm (MOGA) approach to address the class imbalance problem for disease diagnosis. To determine the trade-off between sensitivity and specificity manually is an arduous task. The MOGA approach takes the two extreme training samples from Pareto optimal solutions, one optimally tuned with respect to sensitivity and the other one optimally tuned with respect to specificity on validation data. Two decision tree classification models are built based on these two training sets. The models are named as sensitivity prioritized model (SEPM) and specificity prioritized model (SPPM) respectively. These models are combined to make predictions on the test data. The results obtained through extensive experimentation confirm that the proposed multi-objective scheme makes correct predictions on minority class (SE) without compromising the correct prediction rate on the majority class (SP).



中文翻译:

解决疾病诊断中类别不平衡的新型多目标遗传算法

在进行疾病诊断分类时,普遍存在类不平衡问题的不利后果。这样的分类器可以正确预测大多数阴性类别的示例(带有未标记标签的实例),但无法对阳性类别示例(具有患病标签的实例)做出正确的预测。由于对于诸如疾病诊断之类的敏感领域而言,分类错误的成本可能很高,因此解决分类失衡问题变得至关重要。许多采样技术已用于平衡数据。但是,这些技术降低了分类器模型的整体准确性。这是由于这样的事实,在这种分类器的敏感性和特异性之间存在权衡的问题。本文首先提出了一种基于GA的欠采样技术,该技术具有加权适应度函数来确定敏感性和特异性之间的折衷,然后提出一种多目标遗传算法(MOGA)方法来解决疾病诊断中的类别不平衡问题。手动确定敏感性和特异性之间的权衡是一项艰巨的任务。MOGA方法从Pareto最优解中提取了两个极限训练样本,一个针对灵敏度进行了优化,而另一个针对验证数据的特异性进行了优化。基于这两个训练集建立了两个决策树分类模型。这些模型分别称为敏感性优先模型(SEPM)和特异性优先模型(SPPM)。这些模型被组合以对测试数据进行预测。

更新日期:2020-05-20
down
wechat
bug