Disease variant prediction with deep generative models of evolutionary data,Nature

当前位置： X-MOL 学术 › Nature › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Disease variant prediction with deep generative models of evolutionary data
Nature ( IF 50.5 ) Pub Date : 2021-10-27 , DOI: 10.1038/s41586-021-04043-8
Jonathan Frazer ₁ , Pascal Notin ₂ , Mafalda Dias ₁ , Aidan Gomez ₂ , Joseph K Min ₁ , Kelly Brock ₁ , Yarin Gal ₂ , Debora S Marks _{1,

3}

Affiliation

Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences^1,2,3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods^{4,5,6,7,8,9,10} have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable¹¹. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification^{12,13,14,15,16}. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

中文翻译：

使用进化数据的深度生成模型预测疾病变异

量化人类疾病相关基因中蛋白质变异的致病性将对临床决策产生显着影响，但这些变异中的绝大多数（超过 98%）仍然具有未知的后果^1,2,3。原则上，计算方法可以支持对遗传变异的大规模解释。然而，最先进的方法^{4、5、6、7、8、9、10}依赖于在已知疾病标签上训练机器学习模型。由于这些标签稀疏、有偏差且质量参差不齐，因此生成的模型被认为不够可靠¹¹. 在这里，我们提出了一种利用深度生成模型来预测变异致病性而不依赖标签的方法。通过对生物体中序列变异的分布进行建模，我们隐含地捕获了对保持适应度的蛋白质序列的约束。我们的模型 EVE（变异效应的进化模型）不仅优于依赖标记数据的计算方法，而且性能与高通量实验的预测相当，甚至更好，这些实验越来越多地用作变异分类的证据^{12， 13,14,15,16}. 我们预测了 3,219 个疾病基因中超过 3600 万个变异的致病性，并为超过 256,000 个未知意义的变异的分类提供了证据。我们的工作表明，进化信息模型可以为变异解释提供有价值的独立证据，这将在研究和临床环境中广泛有用。

更新日期：2021-10-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11