当前位置: X-MOL 学术Evol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML
Evolutionary Biology ( IF 1.9 ) Pub Date : 2020-06-30 , DOI: 10.1007/s11692-020-09507-9
Lys Sanz Moreta , Rute R. da Fonseca

The visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and the understanding of its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000; Zhang et al. in Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22(12):2472–2479, 2005) are done in almost complete proteomes, generating large numbers of candidate proteins making the analysis of individual protein structures and models very time-consuming. Here we present the package Link Your Sites (LYS) that can be used to reduce the number of analysed targets to those for which structural information can be retrieved. LYS consists of two python wrapper scripts, where the first one (i) mines the RCSB database (Berman et al. in The protein data bank. Nucleic Acids Res 28(1):235–242, 2000) using the BLAST alignment tool to find the best matching homologous sequences, (ii) fetches their domain positions by using Prosites (Hamelryck and Manderick in Pdb file parser and structure class implemented in python. Bioinformatics 19(17):2308–2310, 2003; Sigrist et al. in Prosite: a documented database using patterns and profiles as motif descriptors. Brief Bioinf 3(3):265–274, 2002; Sigrist et al. in New and continuing developments at prosite. Nucleic Acids Res 41(D1):D344–D347, 2012), (iii) parses the output of PAML extracting the positional information of fast-evolving sites and transforms them into the coordinate system of the protein structure, (iv) outputs one file per gene with the equivalence among the positions in the input sequence and homologous structure. The second script produces figures to be used in publications highlighting the positively selected sites mapped on regions that are known to have functional relevance.
  • Motivation Automatizing the search for protein structures to assess the functional impact of sites found to be under positive selection by codeml, implemented in PAML (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000). Building publication-quality figures highlighting the sites on a protein structure model that are within and outside functional domains. Reduces the workload associated with selecting proteins for which a functional assessment of the impact of substitutions can be done using a protein structure. This is especially relevant when analyzing almost complete proteomes which is the case of large comparative genomic studies.
  • Software LYS scripts are executed in the command line. They automatically search for homologous proteins at the RSCB database (Nielsen in Molecular signatures of natural selection. Annu Rev Genet 39:197–218, 2005), determine the functional domain locations and correlate the positions pointed by the M8 model (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000), and output a data frame that can be used as the input by PyMOL (Schrodinger in The pymol molecular graphics system. Version 1 in 2010) to generate a visualization of the results.
  • Availability LYS is easy to install and implement and they are available at https://github.com/LysSanzMoreta/LYS_Automatic_Search.


中文翻译:

链接您的网站(LYS)脚本:PAML检测到的阳性选择下蛋白质结构的自动搜索和位点图

蛋白质结构中氨基酸突变的分子背景的可视化对于评估其功能影响和理解其进化意义至关重要。目前,使用密码子替代模型(如在PAML中实施的密码子替代模型(Yang和Nielsen,在现实的进化模型下估算同义和非同义替代率)中,搜索快速进化的氨基酸位置。MolBiol Evol 17(1):32-43,2000; Zhang等等人在评估用于检测分子水平上阳性选择的改进的分支位点似然方法的评估中(Mol Biol Evol 22(12):2472-2479,2005)在几乎完整的蛋白质组中完成,产生了大量候选蛋白质,从而使分析单个蛋白质的结构和模型非常耗时。在这里,我们提供了“链接您的站点”(LYS)软件包,该软件包可用于将分析目标的数量减少为可检索结构信息的目标。LYS由两个python包装器脚本组成,其中第一个(i)使用BLAST比对工具挖掘RCSB数据库(Berman等人,在The Protein Database中,Nucleic Acids Res 28(1):235–242,2000)。找到最匹配的同源序列,(ii)使用Prosites来获取它们的结构域位置(Pdb文件解析器中的Hamelryck和Manderick和python中实现的结构类。Bioinformatics 19(17):2308-2310,2003; Sigrist等人,Prosite :使用模式和轮廓作为基序描述符的文献数据库。Brief Bioinf 3(3):265-274,2002; Sigrist等人在prosite的新进展和持续发展中。Nucleic Acids Res 41(D1):D344-D347,2012 ),(iii)解析PAML的输出,提取快速进化位点的位置信息,并将其转换为蛋白质结构的坐标系,(iv)每个基因输出一个文件,输入序列和同源结构中的位置相等。第二个脚本产生要在出版物中使用的图形,以突出显示在已知具有功能相关性的区域上映射的积极选择的站点。
  • 动机自动进行蛋白质结构搜索以评估被编解码器发现处于正选择状态的位点的功能影响,并在PAML中实现(Yang和Nielsen,在现实的进化模型下估算同义和非同义替代率。MolBiol Evol 17(1): 2000年第32-43页)。建立出版物质量的数字突出显示了蛋白质结构模型上功能域内外的位点。减少与选择蛋白质相关的工作量,对于这些蛋白质,可以使用蛋白质结构对取代的影响进行功能评估。这在分析几乎完整的蛋白质组时尤其重要,这是大型比较基因组研究的情况。
  • LYS软件脚本在命令行中执行。他们在RSCB数据库中自动搜索同源蛋白(Nielsen,《自然选择的分子特征》,Annu Rev Genet 39:197–218,2005),确定功能域位置,并关联M8模型所指向的位置(Yang和Nielsen in估计同义和非同义置换率下现实演化模型分子生物学EVOL 17(1):32-43,2000),并输出可被用作所述的PyMOL的分子图形系统的输入由PyMOL的(薛定谔的数据帧版1在2010年)以生成结果的可视化。
  • 可用性LYS易于安装和实现,可从https://github.com/LysSanzMoreta/LYS_Automatic_Search获得。
更新日期:2020-06-30
down
wechat
bug