Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a K NN Variant Algorithm,Arabian Journal for Science and Engineering

当前位置： X-MOL 学术 › Arab. J. Sci. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a K NN Variant Algorithm
Arabian Journal for Science and Engineering ( IF 2.6 ) Pub Date : 2021-03-04 , DOI: 10.1007/s13369-020-05212-z
Ahmed Hamed ₁ , Ahmed Sobhy ₁ , Hamed Nassar ₁

Affiliation

Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular KNN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel KNN variant (KNNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The KNNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The KNNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three KNN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field.

中文翻译：

使用 K NN 变体算法基于不完整异构数据对 COVID-19 进行准确分类

目前正在付出巨大努力来控制 2019 年冠状病毒病 (COVID-19)。数百万人接受医学检查，他们的数据不断堆积等待分类。数据通常不完整且异构，这阻碍了经典分类算法。一些研究人员最近修改了流行的K NN 算法作为解决方案，通过将分类数据转换为数字来处理插补的不完整性和异质性。在本文中，我们介绍了一种新颖的K NN 变体 ( K NNV) 算法，通过彻底的实验工作证明，该算法提供了更好的结果。我们采用粗糙集理论技术来处理不完整性和异质性，并找到K的理想值。 K NNV 算法采用包含人员医疗记录的不完整的异构数据集，并识别出那些患有 COVID-19 的病例。我们在此过程中使用两种流行的距离度量：欧几里德距离和马哈拉诺比斯距离，以努力扩大操作范围。 K NNV 算法是在意大利医学和介入放射学会的真实数据集上实现和测试的。实验结果表明，它可以高效、准确地对COVID-19病例进行分类。它还与三个K NN 导数进行了比较。比较结果表明，它在精确率、召回率、准确率和F值这四个指标上远远优于所有竞争对手。本文给出的算法可以很容易地应用于对其他疾病进行分类。此外，其方法可以进一步扩展到医学领域之外的一般分类任务。

更新日期：2021-03-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11