当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BERTology Meets Biology: Interpreting Attention in Protein Language Models
arXiv - CS - Computation and Language Pub Date : 2020-06-26 , DOI: arxiv-2006.15222
Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. Through the lens of attention, we analyze the inner workings of the Transformer and explore how the model discerns structural and functional properties of proteins. We show that attention (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We also present a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with known biological processes and provide a tool to aid discovery in protein engineering and synthetic biology. The code for visualization and analysis is available at https://github.com/salesforce/provis.

中文翻译:

BERTology 遇到生物学:解释蛋白质语言模型中的注意力

Transformer 架构已被证明可以学习蛋白质分类和生成任务的有用表示。然而,这些表示在可解释性方面提出了挑战。通过注意力的镜头,我们分析了 Transformer 的内部工作原理,并探索了该模型如何辨别蛋白质的结构和功能特性。我们表明注意力 (1) 捕获蛋白质的折叠结构,连接在基础序列中相距很远但在三维结构中空间接近的氨基酸,(2) 靶向结合位点,蛋白质的关键功能成分, (3) 侧重于随着层深度的增加而逐渐变得更复杂的生物物理特性。我们还展示了注意力和蛋白质结构之间相互作用的三维可视化。我们的发现与已知的生物过程一致,并提供了一种工具来帮助发现蛋白质工程和合成生物学。可视化和分析代码可从 https://github.com/salesforce/provis 获得。
更新日期:2020-07-15
down
wechat
bug