当前位置: X-MOL 学术Proc. Natl. Acad. Sci. U.S.A. › 论文详情
Learning the molecular grammar of protein condensates from sequence determinants and embeddings [Biophysics and Computational Biology]
Proceedings of the National Academy of Sciences of the United States of America ( IF 9.412 ) Pub Date : 2021-04-13 , DOI: 10.1073/pnas.2019053118
Kadi L. Saar, Alexey S. Morgunov, Runzhang Qi, William E. Arter, Georg Krainer, Alpha A. Lee, Tuomas P. J. Knowles

Intracellular phase separation of proteins into biomolecular condensates is increasingly recognized as a process with a key role in cellular compartmentalization and regulation. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed, with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, we established an in silico strategy for understanding on a global level the associations between protein sequence and phase behavior and further constructed machine-learning models for predicting protein liquid–liquid phase separation (LLPS). Our analysis highlighted that LLPS-prone proteins are more disordered, less hydrophobic, and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database and that they show a fine balance in their relative content of polar and hydrophobic residues. To further learn in a hypothesis-free manner the sequence features underpinning LLPS, we trained a neural network-based language model and found that a classifier constructed on such embeddings learned the underlying principles of phase behavior at a comparable accuracy to a classifier that used knowledge-based features. By combining knowledge-based features with unsupervised embeddings, we generated an integrated model that distinguished LLPS-prone sequences both from structured proteins and from unstructured proteins with a lower LLPS propensity and further identified such sequences from the human proteome at a high accuracy. These results provide a platform rooted in molecular principles for understanding protein phase behavior. The predictor, termed DeePhase, is accessible from https://deephase.ch.cam.ac.uk/.



中文翻译:

从序列决定簇和嵌入物中学习蛋白质缩合物的分子语法[生物物理学和计算生物学]

蛋白质向生物分子缩合物的细胞内相分离越来越被认为是在细胞区室化和调节中起关键作用的过程。已经提出了关于确定蛋白质形成冷凝物趋势的参数的不同假设,其中一些假设是通过使用由序列改变产生的构建体进行实验性探索的。为了扩大这些观察的范围,我们建立了一种计算机模拟策略,以在全球范围内理解蛋白质序列与相行为之间的关联,并进一步构建了用于预测蛋白质液-液相分离(LLPS)的机器学习模型。我们的分析突出显示LLPS易感蛋白更无序,疏水性更小,并且其Shannon熵比Protein Data Bank或​​Swiss-Prot数据库中的序列低,并且它们在极性和疏水残基的相对含量方面显示出良好的平衡。为了以无假设的方式进一步学习LLPS的序列特征,我们训练了一个基于神经网络的语言模型,发现基于这种嵌入的分类器以与使用知识的分类器相当的准确性学习了相行为的基本原理。基于功能。通过将基于知识的特征与无监督嵌入相结合,我们生成了一个集成模型,该模型可将LLPS易发序列与结构蛋白和具有较低LLPS倾向的非结构化蛋白区分开,并从人类蛋白质组学中高度准确地鉴定出此类序列。这些结果提供了一个基于分子原理的平台,用于理解蛋白质相行为。可从https://deephase.ch.cam.ac.uk/访问称为DeePhase的预测变量。

更新日期:2021-04-08
全部期刊列表>>
JACS
材料科学跨学科高质量前沿研究
中国作者高影响力研究精选
虚拟特刊
亚洲大洋洲地球科学
NPJ欢迎投稿
自然科研论文编辑
ERIS期刊投稿
欢迎阅读创刊号
自然职场,为您触达千万科研人才
spring&清华大学出版社
城市可持续发展前沿研究专辑
Springer 纳米技术权威期刊征稿
全球视野覆盖
施普林格·自然新
chemistry
物理学研究前沿热点精选期刊推荐
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
阿拉丁试剂right
屿渡论文,编辑服务
何川
清华大学
郭维
陈永胜
上海中医药大学
华东师范大学
张夏衡
史大永
楚甲祥
西湖石航
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
南开大学
张韶光
华辉
天合科研
x-mol收录
试剂库存
down
wechat
bug