当前位置: X-MOL 学术bioRxiv. Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RVAgene: Generative modeling of gene expression time series data
bioRxiv - Systems Biology Pub Date : 2020-11-11 , DOI: 10.1101/2020.11.10.375436
Raktim Mitra , Adam L. MacLean

Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of single-cell datasets. Here we extend these methods for use with gene expression time series data. We present RVAgene: a recurrent variational autoencoder to model gene expression dynamics. RVAgene learns to accurately and efficiently reconstruct temporal gene profiles. It also learns a low dimensional representation of the data via a recurrent encoder network that can be used for biological feature discovery, and can generate new gene expression data by sampling from the latent space. We test RVAgene on simulated and real biological datasets, including embryonic stem cell differentiation and kidney injury response dynamics. In all cases, RVAgene accurately reconstructed complex gene expression temporal profiles. Via cross validation, we show that a low-error latent space representation can be learnt using only a fraction of the data. Through clustering and gene ontology term enrichment analysis on the latent space, we demonstrate the potential of RVAgene for unsupervised discovery. In particular, RVAgene identifies new programs of shared gene regulation of Lox family genes in response to kidney injury.

中文翻译:

RVAgene:基因表达时间序列数据的生成建模

目前,对于大型(暂时丰富或单细胞)数据集而言,在全基因组水平上模拟基因表达动态变化的方法还不够。变体自动编码器提供了表征大型数据集的方法,并已有效地用于表征单单元数据集的特征。在这里,我们扩展了这些方法以用于基因表达时间序列数据。我们提出了RVAgene:一种循环变异自编码器,用于模拟基因表达动力学。RVAgene学习准确而有效地重建时间基因谱。它还通过可用于生物学特征发现的递归编码器网络学习数据的低维表示,并可以通过从潜在空间采样来生成新的基因表达数据。我们在模拟和真实生物学数据集上测试RVAgene,包括胚胎干细胞分化和肾脏损伤反应的动力学。在所有情况下,RVAgene都能准确地重建复杂基因表达的时间概况。通过交叉验证,我们表明可以仅使用一部分数据来学习低错误的潜在空间表示。通过对潜在空间的聚类和基因本体项富集分析,我们证明了RVAgene在无监督发现中的潜力。特别地,RVAgene识别了响应肾脏损伤的Lox家族基因共享基因调控的新程序。通过对潜在空间的聚类和基因本体项富集分析,我们证明了RVAgene在无监督发现中的潜力。特别地,RVAgene识别了响应肾脏损伤的Lox家族基因共享基因调控的新程序。通过对潜在空间的聚类和基因本体项富集分析,我们证明了RVAgene在无监督发现中的潜力。特别地,RVAgene识别了响应肾脏损伤的Lox家族基因共享基因调控的新程序。
更新日期:2020-11-12
down
wechat
bug