当前位置: X-MOL 学术J. Circuits Syst. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Balancing Assisted Reproductive Technology Dataset for Improving the Efficiency of Incremental Classifiers and Feature Selection Techniques
Journal of Circuits, Systems and Computers ( IF 0.9 ) Pub Date : 2020-09-24 , DOI: 10.1142/s0218126621300075
A. Suruliandi 1 , K. Ranjini 1 , S. P. Raja 2
Affiliation  

Assisted Reproductive Technology (ART) is a set of medical procedures primarily used to address infertility. Success Rate of ART is very low because it is affected by large number of variables. Machine Learning Techniques are now applied to predict ART outcome and to find strategies to improve success rate. For this, determining the best performing classifier for ART is very important. Previously, some classifiers are applied to ART with static data. But, in reality, the datasets are dynamic in nature and require dynamic setup which can be achieved with the help of Incremental Classifiers. Due to low success rate, the ART dataset contains less number of records for positive results that make the dataset imbalanced. This research work first finds the best evaluation metric for classification on imbalanced dataset and then balances the dataset using three different balancing techniques such as undersampling, oversampling and Synthetic Minority Oversampling Technique (SMOTE) and applies five different Incremental Classifiers, namely Stochastic Gradient Descent (SGD), Stochastic Primal Estimated sub-GrAdient SOlver for Support vector machine (SPegasos), Naïve Bayes Updatable, Instance Based (IBk), Averaged One Dependence Estimators (A1DE) Updatable and finds the best balancing technique and suitable classifier for ART outcome prediction. The result shows that for an imbalanced dataset Receiver Operating Characteristics (ROC) Area may be taken as a metric instead of the accuracy. It is found that SMOTE is best method for balancing the ART dataset and IB1 classifier performs well for the balanced data with the high prediction rate of 92.3 for ROC. Finally, various Feature Selection methods are applied to the top three best performing classifiers and suitable feature selection method for each classifier is identified.

中文翻译:

平衡辅助生殖技术数据集以提高增量分类器和特征选择技术的效率

辅助生殖技术 (ART) 是一套主要用于解决不孕症的医疗程序。ART的成功率非常低,因为它受大量变量的影响。机器学习技术现在被应用于预测 ART 结果并寻找提高成功率的策略。为此,为 ART 确定性能最佳的分类器非常重要。以前,一些分类器应用于具有静态数据的 ART。但是,实际上,数据集本质上是动态的,需要动态设置,这可以在增量分类器的帮助下实现。由于成功率低,ART 数据集包含较少数量的阳性结果记录,这使得数据集不平衡。本研究工作首先找到对不平衡数据集进行分类的最佳评估指标,然后使用欠采样、过采样和合成少数过采样技术 (SMOTE) 三种不同的平衡技术来平衡数据集,并应用五种不同的增量分类器,即随机梯度下降 (SGD) )、支持向量机的随机原始估计子梯度求解器 (SPegasos)、朴素贝叶斯可更新、基于实例 (IBk)、平均一个依赖估计器 (A1DE) 可更新并为 ART 结果预测找到最佳平衡技术和合适的分类器。结果表明,对于不平衡的数据集,接收器操作特征 (ROC) 面积可以作为度量而不是准确度。发现 SMOTE 是平衡 ART 数据集的最佳方法,IB1 分类器对平衡数据表现良好,对 ROC 的预测率高达 92.3。最后,将各种特征选择方法应用于性能最佳的前三个分类器,并确定每个分类器的合适特征选择方法。
更新日期:2020-09-24
down
wechat
bug