当前位置: X-MOL 学术Comput. Methods Programs Biomed. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.
Computer Methods and Programs in Biomedicine ( IF 6.1 ) Pub Date : 2020-06-27 , DOI: 10.1016/j.cmpb.2020.105625
Santos Kumar Baliarsingh 1 , Swati Vipsita 2 , Amir H Gandomi 3 , Abhijeet Panda 4 , Sambit Bakshi 5 , Somula Ramasubbareddy 6
Affiliation  

Background: The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods.

Methods: In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter (σ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN.

Results: These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and a comparative analysis is carried out with the existing FS and classification techniques. The results suggest that WCGWO-mrPNN can outperform other methods for high dimensional microarray classification.

Conclusion: The effectiveness of the proposed methods are compared with other existing schemes. Experimental results reveal that the proposed scheme is accurate and robust. Hence, the suggested scheme is considered to be a reliable framework for microarray data analysis.

Significance: Such a method promotes the application of parallel programming using Hadoop cluster for the analysis of large-scale genomics data, particularly when the dataset is of high dimension.



中文翻译:

使用基于MapReduce的概率神经网络分析高维基因组数据。

背景:在过去十年中,基因组数据的规模一直在迅速增长。但是,常规的数据分析技术无法处理大量数据。为了高效处理高维数据集,必须开发一些新的并行方法。

方法:在这项工作中,提出了一种基于Map-Reduce(MR)的新分布式方法。所提出的算法由基于MR的Fisher评分(mrFScore),基于MR的ReliefF(mrRefiefF)和使用加权混沌灰狼优化技术(WCGWO)的基于MR的概率神经网络(mrPNN)组成。在这里,介绍了mrFScore和mrRefiefF方法用于特征选择(FS),而mrPNN被实现为一种有效的微阵列分类方法。平滑参数(的适当选择σ)发挥了其使用新颖的技术,即,WCGWO寻址的PNN的预测能力的主要作用。WCGWO算法用于在PNN中选择σ的最佳值。

结果:这些算法已使用Hadoop框架成功实现。使用三个大型和一个小型微阵列数据集对提出的模型进行了测试,并使用现有的FS和分类技术进行了比较分析。结果表明WCGWO-mrPNN可以胜过其他用于高维微阵列分类的方法。

结论:将所提方法的有效性与其他现有方案进行了比较。实验结果表明,该方案准确,鲁棒。因此,建议的方案被认为是用于微阵列数据分析的可靠框架。

启示:这种方法促进了使用Hadoop集群的并行编程在大规模基因组数据分析中的应用,特别是在数据集具有高维的情况下。

更新日期:2020-06-27
down
wechat
bug