当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Implementation of a Bayesian secondary structure estimation method for the SESCA circular dichroism analysis package
Computer Physics Communications ( IF 6.3 ) Pub Date : 2021-05-06 , DOI: 10.1016/j.cpc.2021.108022
Gabor Nagy , Helmut Grubmuller

Circular dichroism spectroscopy is a structural biology technique frequently applied to determine the secondary structure composition of soluble proteins. Our recently introduced computational analysis package SESCA aids the interpretation of protein circular dichroism spectra and enables the validation of proposed corresponding structural models. To further these aims, we present the implementation and characterization of a new Bayesian secondary structure estimation method in SESCA, termed SESCA_bayes. SESCA_bayes samples possible secondary structures using a Monte Carlo scheme, driven by the likelihood of estimated scaling errors and non-secondary-structure contributions of the measured spectrum. SESCA_bayes provides an estimated secondary structure composition and separate uncertainties on the fraction of residues in each secondary structure class. It also assists efficient model validation by providing a posterior secondary structure probability distribution based on the measured spectrum. Our presented study indicates that SESCA_bayes estimates the secondary structure composition with a significantly smaller uncertainty than its predecessor, SESCA_deconv, which is based on spectrum deconvolution. Further, the mean accuracy of the two methods in our analysis is comparable, but SESCA_bayes provides more accurate estimates for circular dichroism spectra that contain considerable non-SS contributions.

Program summary

Program Title: SESCA_bayes

CPC Library link to program files: https://doi.org/10.17632/5nnsbn6ync.1

Developer's repository link: https://www.mpibpc.mpg.de/sesca

Licensing provisions: GPLv3

Programming language: Python

Nature of problem: The circular dichroism spectrum of a protein is strongly correlated with its secondary structure composition. However, determining the secondary structure from a spectrum is hindered by non-secondary structure contributions and by scaling errors due the uncertainty of the protein concentration. If not taken properly into account, these experimental factors can cause considerable errors when conventional secondary-structure estimation methods are used. Because these errors combine with errors of the proposed structural model in a non-additive fashion, it is difficult to assess how much uncertainty the experimental factors introduce to model validation approaches based on circular dichroism spectra.

Solution method: For a given measured circular dichroism spectrum, the SESCA_bayes algorithm applies Bayesian statistics to account for scaling errors and non-secondary structure contributions and to determine the conditional secondary structure probability distribution. This approach relies on fast spectrum predictions based on empirical basis spectrum sets and joint probability distribution maps for scaling factors and non-secondary structure distributions. Because SESCA_bayes estimates the most probable secondary structure composition based on a probability-weighted sample distribution, it avoids the typical fitting errors that occur during conventional spectrum deconvolution methods. It also estimates the uncertainty of circular dichroism based model validation more accurately than previous methods of the SESCA analysis package.



中文翻译:

SESCA圆二色性分析包的贝叶斯二级结构估计方法的实现

圆二色性光谱学是一种结构生物学技术,经常用于确定可溶性蛋白质的二级结构组成。我们最近推出的计算分析软件包SESCA有助于蛋白质圆二色性光谱的解释,并能够验证提出的相应结构模型。为了实现这些目标,我们介绍了一种称为SESCA_bayes的SESCA中新的贝叶斯二级结构估计方法的实现和特征。SESCA_bayes使用蒙特卡洛(Monte Carlo)方案对可能的二级结构进行采样,这由估计的比例误差和所测频谱的非二级结构贡献的可能性驱动。SESCA_bayes提供了估计的二级结构组成以及每个二级结构类别中残基分数的单独不确定性。它还通过提供基于所测频谱的后继二级结构概率分布来帮助进行有效的模型验证。我们提出的研究表明,与基于频谱反卷积的前身SESCA_deconv相比,SESCA_bayes估计的二级结构组成具有明显较小的不确定性。此外,在我们的分析中这两种方法的平均准确度相当,但是SESCA_bayes对包含大量非SS贡献的圆二色性光谱提供了更准确的估计。它还通过提供基于所测频谱的后继二级结构概率分布来帮助进行有效的模型验证。我们提出的研究表明,与基于频谱反卷积的前身SESCA_deconv相比,SESCA_bayes估计的二级结构组成具有明显较小的不确定性。此外,在我们的分析中这两种方法的平均准确度相当,但是SESCA_bayes对包含大量非SS贡献的圆二色性光谱提供了更准确的估计。它还通过提供基于所测频谱的后继二级结构概率分布来帮助进行有效的模型验证。我们提出的研究表明,与基于频谱反卷积的前身SESCA_deconv相比,SESCA_bayes估计的二级结构组成具有明显较小的不确定性。此外,在我们的分析中这两种方法的平均准确度相当,但是SESCA_bayes对包含大量非SS贡献的圆二色性光谱提供了更准确的估计。

计划摘要

节目名称: SESCA_bayes

CPC库链接到程序文件: https : //doi.org/10.17632/5nnsbn6ync.1

开发人员的资料库链接: https : //www.mpibpc.mpg.de/sesca

许可条款: GPLv3

编程语言: Python

问题的性质:蛋白质的圆二色性光谱与其二级结构组成密切相关。然而,由于蛋白质浓度的不确定性,非二级结构的贡献和缩放误差会阻碍从光谱中确定二级结构。如果没有适当考虑,当使用常规的二级结构估算方法时,这些实验因素会导致相当大的误差。由于这些误差与拟议的结构模型的误差以非累加的方式结合在一起,因此很难评估实验因素为基于圆二色性光谱的模型验证方法带来的不确定性。

解决方法:对于给定的测得的圆二色性光谱,SESCA_bayes算法应用贝叶斯统计量来解决缩放误差和非二级结构贡献,并确定条件二级结构的概率分布。该方法依赖于基于经验基础频谱集和联合概率分布图的快速频谱预测,以获取比例因子和非二级结构分布。由于SESCA_bayes基于概率加权的样本分布来估计最可能的二级结构组成,因此它避免了在常规频谱反卷积方法中发生的典型拟合误差。与SESCA分析软件包的先前方法相比,它还可以更准确地估计基于圆二色性的模型验证的不确定性。

更新日期:2021-05-25
down
wechat
bug