当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mi3-GPU: MCMC-based inverse ising inference on GPUs for protein covariation analysis
Computer Physics Communications ( IF 7.2 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.cpc.2020.107312
Allan Haldane 1 , Ronald M Levy 2
Affiliation  

Abstract Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU (“mee-three”, for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is “generative” in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a “Zwanzig reweighting” technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ∼ 300 with 21 residue types with tens of millions of inferred parameters in short running times. Program summary Program Title: Mi3-GPU Program Files doi: http://dx.doi.org/10.17632/ftbcfy2p35.1 Licensing provisions: GPLv3 Programming languages: Python3, OpenCL, C Nature of problem: Mi3-GPU solves the inverse Ising problem for application in protein covariation analysis. The goal is to infer “coupling” parameters between positions in a Multiple Sequence Alignment of a protein family, with many applications including protein-contact prediction and fitness prediction. Solution method: Mi3-GPU solves the inverse Ising problem with few approximations using Markov-Chain Monte Carlo methods with Quasi-Newton optimization on GPUs. This problem previously has been approached by more approximate methods using analytic approximations including “message Passing”, “Susceptibility Propagation”, “mean-field” methods, pseudolikelihood approximations, and cluster expansion. The software leverages GPU to accelerate MCMC sampling and a histogram reweighting technique to accelerate parameter optimization.

中文翻译:

Mi3-GPU:基于 MCMC 的 GPU 逆向推理用于蛋白质协变分析

摘要 Inverse Ising 推断是一种基于观察到的位点协变推断 Potts/Ising 模型的耦合参数的方法,该方法在蛋白质物理学中发现了重要的应用,用于检测蛋白质家族中残基之间的相互作用。我们介绍了 Mi3-GPU(“mee-three”,用于 MCMC Inverse Ising Inference)软件,用于通过 GPU 上的并行马尔可夫链蒙特卡罗采样解决蛋白质序列数据集的逆 Ising 问题,几乎没有分析近似。我们还提供用于分析和准备蛋白质家族多序列比对 (MSA) 的工具,以解决有限采样问题,这是逆 Ising 推理中误差或偏差的主要来源。我们的方法是“生成的”,因为推断模型可用于生成合成 MSA,其突变统计量(边际)可以被验证以匹配数据集 MSA 统计量,直至有限采样效应施加的限制。我们的 GPU 实现能够构建模型,这些模型能够以更近似的方法无法实现的精度再现观察到的 MSA 的协变模式。我们方法的主要组成部分是一种 GPU 优化算法,可大大加速 MCMC 采样,并结合使用“Zwanzig 重新加权”技术的多步准牛顿参数更新方案。我们展示了该软件在典型蛋白质家族数据集上生成生成模型的能力,该模型的序列长度为 L ~ 300,具有 21 种残基类型,并在短时间内推断出数千万个参数。程序摘要 程序名称:Mi3-GPU 程序文件 doi:http://dx.doi.org/10.17632/ftbcfy2p35.1 许可规定:GPLv3 编程语言:Python3、OpenCL、C 问题性质:Mi3-GPU 解决了 Inverse Ising蛋白质协变分析中的应用问题。目标是推断蛋白质家族的多序列比对中位置之间的“耦合”参数,具有许多应用,包括蛋白质接触预测和适应度预测。解决方法:Mi3-GPU 在 GPU 上使用带有准牛顿优化的马尔可夫链蒙特卡罗方法以很少的近似值解决了逆伊辛问题。这个问题以前已经通过使用解析近似的更近似方法来解决,包括“消息传递”、“敏感性传播”、“平均场”方法、伪似然近似和集群扩展。该软件利用 GPU 来加速 MCMC 采样和直方图重加权技术来加速参数优化。
更新日期:2021-03-01
down
wechat
bug