当前位置: X-MOL 学术Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning the language of viral evolution and escape
Science ( IF 56.9 ) Pub Date : 2021-01-14 , DOI: 10.1126/science.abd7331
Brian Hie 1, 2 , Ellen D Zhong 1, 3 , Bonnie Berger 1, 4 , Bryan Bryson 2, 5
Affiliation  

Natural language predicts viral escape Viral mutations that evade neutralizing antibodies, an occurrence known as viral escape, can occur and may impede the development of vaccines. To predict which mutations may lead to viral escape, Hie et al. used a machine learning technique for natural language processing with two components: grammar (or syntax) and meaning (or semantics) (see the Perspective by Kim and Przytycka). Three different unsupervised language models were constructed for influenza A hemagglutinin, HIV-1 envelope glycoprotein, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein. Semantic landscapes for these viruses predicted viral escape mutations that produce sequences that are syntactically and/or grammatically correct but effectively different in semantics and thus able to evade the immune system. Science, this issue p. 284; see also p. 233 Language models of influenza hemagglutinin, HIV Env, and SARS-CoV-2 spike viral protein sequences can accurately predict viral escape patterns. The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.

中文翻译:

学习病毒进化和逃逸的语言

自然语言可预测病毒逃逸 逃避中和抗体的病毒突变(称为病毒逃逸)可能发生并可能阻碍疫苗的开发。为了预测哪些突变可能导致病毒逃逸,Hie 等人。使用机器学习技术进行自然语言处理,包括两个部分:语法(或句法)和意义(或语义)(参见 Kim 和 Przytycka 的观点)。为甲型流感血凝素、HIV-1 包膜糖蛋白和严重急性呼吸系统综合症冠状病毒 2 (SARS-CoV-2) 刺突糖蛋白构建了三种不同的无监督语言模型。这些病毒的语义景观预测了病毒逃逸突变,这些突变产生的序列在句法和/或语法上是正确的,但在语义上实际上不同,从而能够逃避免疫系统。科学,这个问题 p。284; 另见第。233 流感血凝素、HIV Env 和 SARS-CoV-2 刺突病毒蛋白序列的语言模型可以准确预测病毒逃逸模式。病毒变异和逃避人类免疫系统并引起感染的能力,称为病毒逃逸,仍然是抗病毒和疫苗开发的障碍。了解控制逃逸的复杂规则可以为治疗设计提供信息。我们使用最初为人类自然语言开发的机器学习算法对病毒逃逸进行建模。我们将逃逸突变确定为保留病毒传染性但导致病毒看起来与免疫系统不同的突变,类似于保持句子语法但改变其含义的单词变化。通过这种方法,流感血凝素的语言模型,HIV-1 包膜糖蛋白 (HIV Env) 和严重急性呼吸系统综合症冠状病毒 2 (SARS-CoV-2) 尖峰病毒蛋白可以仅使用序列数据准确预测结构逃逸模式。我们的研究代表了自然语言和病毒进化之间有希望的概念桥梁。
更新日期:2021-01-14
down
wechat
bug