当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-12-30 , DOI: 10.1186/s12859-020-03810-0
Sudipta Acharya 1 , Laizhong Cui 1 , Yi Pan 2
Affiliation  

In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

中文翻译:


用于识别基因标记的多视图特征选择:多样化的生物数据驱动方法



近年来,为了研究具有挑战性的生物信息学问题,利用多种基因组和蛋白质组来源在研究人员中变得非常流行。其中一个问题是特征或基因选择以及从高维基因表达数据集中识别相关和非冗余标记基因。在这种情况下,设计一种利用来自多个潜在生物资源的知识的有效特征选择算法可能是了解癌症或其他疾病谱的有效方法,并应用于特定人群的特定流行病学。在本文中,我们将特征选择和标记基因检测设计为多视图多目标聚类问题。为此,我们提出了一种基于无监督多视图多目标聚类的基因选择方法,称为 UMVMO-select。生物数据的三个重要资源(基因本体、蛋白质相互作用数据、蛋白质序列)以及基因表达值被共同利用来设计两种不同的视图。 UMVMO-select 旨在减少基因空间而不/最小程度地影响样本分类效率,并从三个癌症基因表达基准数据集中确定相关和非冗余的基因标记。针对多个内部和外部有效性指标,对五种聚类和九种现有特征选择方法进行了彻底的比较分析。获得的结果揭示了所提出方法的优越性。报告的结果还通过适当的生物学意义测试和热图绘制进行验证。
更新日期:2020-12-30
down
wechat
bug