当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Boltzmann Machine Learning and Regularization Methods for Inferring Evolutionary Fields and Couplings From a Multiple Sequence Alignment
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2020-05-08 , DOI: 10.1109/tcbb.2020.2993232
Sanzo Miyazawa

The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using $L_2$ regularization for fields, group $L_1$ for couplings is shown to be very effective for sparse couplings in comparison with $L_2$ and $L_1$ . Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings and to use a soft-thresholding function for group $L_1$ . It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.

中文翻译:

从多序列比对中推断进化场和耦合的玻尔兹曼机器学习和正则化方法

从单个位点和成对氨基酸频率推断同源蛋白质序列的玻尔兹曼分布的逆 Potts 问题最近在蛋白质结构和进化研究中引起了极大的关注。我们研究正则化和学习方法,以及如何调整正则化参数以正确推断玻尔兹曼机器学习中的相互作用。使用$L_2$字段、组的正则化$L_1$$L_2$$L_1$ . 调整两个正则化参数,以使进化能量的样本和集合平均值产生相等的值。两个平均值都平滑地变化和收敛,但它们的学习曲线在不同的学习方法之间有很大的不同。修改了 Adam 方法,使步长与稀疏耦合的梯度成比例,并对组使用软阈值函数$L_1$ . 首先从蛋白质序列推断相互作用,然后从蒙特卡洛样本中推断出场和耦合可以很好地恢复,但是对于天然蛋白质来说,恢复总能量分辨率中的成对相关性比蛋白质更难-像序列。还估计了蛋白质进化中折叠/结构约束的选择性温度。
更新日期:2020-05-08
down
wechat
bug