当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gaussian Embedding for Large-scale Gene Set Analysis.
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2020-06-15 , DOI: 10.1038/s42256-020-0193-2
Sheng Wang 1 , Emily R Flynn 2 , Russ B Altman 1, 2, 3
Affiliation  

Gene sets, including protein complexes and signalling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein–protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumours, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a clinical prognostic and predictive subnetwork around neurofilament medium in sarcoma, which we validate in independent cohorts.



中文翻译:

大规模基因集分析的高斯嵌入。

基因集,包括蛋白质复合物和信号传导途径,已经大大扩散,很大程度上是由于高通量生物学数据的结果。利用基因集来深入了解生物发现需要使用计算方法将其转换为可用的机器学习模型的有用形式。在这里,我们研究将基因集嵌入为与可用的机器学习代码兼容的紧凑特征的问题。我们介绍Set2Gaussian,这是一种基于网络的新型基因集嵌入方法,根据这些基因在蛋白质-蛋白质相互作用网络中的接近程度,将每个基因集表示为多元高斯分布,而不是低维空间中的单个点。我们证明Set2Gaussian改善了基因集成员的识别能力,准确地将肿瘤分层,并找到用于基因集富集分析的简明基因集。我们进一步展示了Set2Gaussian如何使我们能够确定肉瘤中神经丝介质周围的临床预后和预测子网络,并在独立队列中进行了验证。

更新日期:2020-06-15
down
wechat
bug