当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-01-13 , DOI: 10.1186/s40537-021-00409-x
Khawla Tadist , Fatiha Mrabti , Nikola S. Nikolov , Azeddine Zahi , Said Najah

The Dimensionality Curse is one of the most critical issues that are hindering faster evolution in several fields broadly, and in bioinformatics distinctively. To counter this curse, a conglomerate solution is needed. Among the renowned techniques that proved efficacy, the scaling-based dimensionality reduction techniques are the most prevalent. To insure improved performance and productivity, horizontal scaling functions are combined with Particle Swarm Optimization (PSO) based computational techniques. Optimization algorithms are an interesting substitute to traditional feature selection methods that are both efficient and relatively easier to scale. Particle Swarm Optimization (PSO) is an iterative search algorithm that has proved to achieve excellent results for feature selection problems. In this paper, a composite Spark Distributed approach to feature selection that combines an integrative feature selection algorithm using Binary Particle Swarm Optimization (BPSO) with Particle Swarm Optimization (PSO) algorithm for cancer prognosis is proposed; hence Spark Distributed Particle Swarm Optimization (SDPSO) approach. The effectiveness of the proposed approach is demonstrated using five benchmark genomic datasets as well as a comparative study with four state of the art methods. Compared with the four methods, the proposed approach yields the best in average of purity ranging from 0.78 to 0.97 and F-measure ranging from 0.75 to 0.96.



中文翻译:

SDPSO:基于Spark分布式PSO的特征选择和癌症疾病预后方法

维度诅咒是最关键的问题之一,它在广泛的领域以及在生物信息学领域明显阻碍了更快的发展。为了克服这种诅咒,需要一个综合解决方案。在已证明有效的著名技术中,基于缩放的降维技术最为流行。为了确保提高性能和生产率,将水平缩放功能与基于粒子群优化(PSO)的计算技术相结合。优化算法是有效且相对易于扩展的传统特征选择方法的有趣替代。粒子群优化(PSO)是一种迭代搜索算法,事实证明,该算法可为特征选择问题取得出色的结果。在本文中,提出了一种组合星火分布式特征选择的方法,该方法结合了使用二进制粒子群优化(BPSO)和粒子群优化(PSO)算法的综合特征选择算法来进行癌症预后。因此,采用了Spark分布式粒子群优化(SDPSO)方法。使用五个基准基因组数据集以及使用四个最新方法进行的比较研究证明了该方法的有效性。与四种方法相比,所提出的方法产生的最佳平均纯度在0.78至0.97范围内,而F值在0.75至0.96之间。因此,采用了Spark分布式粒子群优化(SDPSO)方法。使用五个基准基因组数据集以及使用四个最新方法进行的比较研究证明了该方法的有效性。与四种方法相比,所提出的方法产生的最佳平均纯度在0.78至0.97范围内,而F值在0.75至0.96之间。因此,采用了Spark分布式粒子群优化(SDPSO)方法。使用五个基准基因组数据集以及使用四个最新方法进行的比较研究证明了该方法的有效性。与四种方法相比,所提出的方法产生的最佳平均纯度在0.78至0.97范围内,而F值在0.75至0.96之间。

更新日期:2021-01-13
down
wechat
bug