Transfer learning enables predictions in network biology,Nature

当前位置： X-MOL 学术 › Nature › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Transfer learning enables predictions in network biology
Nature ( IF 64.8 ) Pub Date : 2023-05-31 , DOI: 10.1038/s41586-023-06139-9
Christina V Theodoris _{1,

2,

3,

4} , Ling Xiao _{2,

5} , Anant Chopra ₆ , Mark D Chaffin ₂ , Zeina R Al Sayed ₂ , Matthew C Hill _{2,

5} , Helene Mantineo _{2,

5} , Elizabeth M Brydon ₆ , Zexian Zeng _{1,

7} , X Shirley Liu _{1,

7,

8} , Patrick T Ellinor _{2,

5}

Affiliation

Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding^1,2 and computer vision³ by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.

中文翻译：

迁移学习可以实现网络生物学的预测

绘制基因网络需要大量的转录组数据来了解基因之间的联系，这阻碍了在数据有限的情况下的发现，包括罕见疾病和影响临床上难以接近的组织的疾病。^{最近，迁移学习彻底改变了自然语言理解1,2}和计算机视觉³等领域通过利用在大规模通用数据集上预训练的深度学习模型，然后可以针对特定任务数据有限的大量下游任务进行微调。在这里，我们开发了一种上下文感知、基于注意力的深度学习模型 Geneformer，该模型在约 3000 万个单细胞转录组的大规模语料库上进行了预训练，以便在网络生物学数据有限的环境中实现上下文特定的预测。在预训练过程中，Geneformer 获得了对网络动态的基本了解，以完全自我监督的方式将网络层次结构编码到模型的注意力权重中。使用有限的特定任务数据对与染色质和网络动态相关的多种下游任务进行微调，表明 Geneformer 持续提高了预测准确性。Geneformer 应用于有限患者数据的疾病建模，确定了心肌病的候选治疗靶点。总体而言，Geneformer 代表了一种预训练的深度学习模型，可以对广泛的下游应用进行微调，以加速关键网络调节因子和候选治疗靶点的发现。

更新日期：2023-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>