290913
当前位置: 首页   >  课题组新闻   >  补全PDB文件中缺失的氨基酸
补全PDB文件中缺失的氨基酸
发布时间:2019-12-31

1. I-tasser   http://zhanglab.ccmb.med.umich.edu/I-TASSER/

2.modeller  MODELLER 9.20 https://salilab.org/modeller/

3. Schordinger 商业软件

4. Biodesigner http://www.pirx.com/biodesigner/download.html 

General features

  Multiple file format support (PDB, Hyperchem, Alchemy, Insight, Sybyl). Automatic recognition of the file type.

   Native compressed file format BIO.

   Realtime monitoring of various molecular properties, including distances, angles (bond, torsional and improper), coordinate and distace RMS deviations.

Graphical features

  Various styles of visualization (wire, cylinders, VdW spheres, ball-and-stick, polygon, ribbon, strands, tube, string, cartoon). Possible using several rendering styles in a single picture.

  Output to bitmap and PostScript formats. Generating raytraced images using external software (POV-Ray).

  Animation of simulation trajectories. Capability of interpolating of consecutive frames, so high frequency molecular moves can be filtered out.

Molecular builder

  Atom builder allow for adding and modifying single atoms and bonds.

  Protein chain constructor can build and alter short polypeptides.

Computational engine

  Superpositioning algorithm

  Structural alignment of protein chains.

  Protein chain refinement algorithm. Uses a set of heuristic rules to reconstruct and refine the polypeptide chain.

Protein sequence processor

  Aligning multiple sequences.

  Editing multiple sequence alignments.

   Support for numerous sequence file formats (FASTA, BLAST, PIR, SEQ).

   Various coloring modes. Includes amino acid identity, similarity, sequence block conservation, amino acid physical properties.

  Export to PostScript and HTML file formats for publication and web appliactions.

  Full integration with molecular viewer window. Any change of the sequence/alignment can be immediately visualized.

5. GalaxyFill  from https://mp.weixin.qq.com/s/Wg0dKCEje87Rddns_oKW1Q

    GalaxyFill是一个项目用来进行对蛋白的缺失结构进行补全。其功能我觉得非常强大,比如用来进行蛋白的延长,甚至进行融合蛋白的制作。 使用Galaxy需要为 Linux64-bit位系统请注意无法补全中间的缺失结构

安装

  1. 下载 GalaxyFill项目:

 
 
 
 
  1. wget https://github.com/seoklab/GalaxyFill/archive/master.zip

  1. 解压

 
 
 
 
  1. unzip master.zip

  2. #修改名字

  3. mv GalaxyFill_master/ GalaxyFill

  1. 设置环境变量

 
 
 
 
  1. gedit ~/.bashrc

  2. ##增加如下行,需要根据自己进行修改

  3. ## GalaxyFill

  4. ## export GALAXY_HOME=/home/kangsgo/install/GalaxyFill

  5. source ~/.bashrc

使用

使用方法如下: 

 
 
 
 
  1. $GALAXY_HOME/bin/GalaxyFill [-h] [-p INPUT PDB File] [-s INPUT FASTA File]

输入参数和设置:

 
 
 
 
  1. -p or --pdb : Input protein structure file in PDB format (mendatory)

  2. -s or --seq : Input protein sequence file in FASTA format (mendatory)

  3. -o or --out : Output protein structure file name (optional, default=${Input PDB prefix}_fill.pdb)

  4. -t or --title : Running title for GalaxyFill (optional, default=${Input PDB prefix})


5. Chimera  from https://mp.weixin.qq.com/s/PF1v8mIw0Oe6RMw_qDveXA

其实Chimera补全蛋白缺失结构主要是是利用的 modeller,可以补全尾部结构或者中间的缺失结构,所以个人觉得对于之前介绍的 GalaxyFill更加强大。

        我们以 PDB:1qln为例,1qlnT7 RNA 聚合酶,其中包括了一段核酸序列。我们可以先下载下来了解其信息:

 
 
 
 
  1. wget https://files.rcsb.org/download/1QLN.pdb

        可以看到MISSING RESIDUES信息,主要是前端和 56-71的loop环的缺失。 

        我们将对其中loop进行补齐,若缺失的loop环是自己设计的残基,那么还需要自己修改 SEQRES信息,如下图: 

     其中 SEQRES为完整序列信息(包含缺失序列),可以自己创建或者修改添加从而达到自己的补全内容的目的。除非特殊要求一般PDB数据库中不需要修改或者自己添加。

        我们打开UCSF Chimera,点击Favorites -> Command Line,在下面Command中进行下载蛋白,删除核酸等操作:

 
 
 
 
  1.  #打开PDB 1qln

  2.   open 1qln

  3.  #删除 核酸和溶剂等

  4.    delete ~protein

        再点击Tools -> Structure Editing -> Model/Refines Loops 会弹出两个框框,其中这个框框主要是完整的序列信息,其中缺失蛋白位置会用红色框框圈出。 

        另外一个为设置modeller的框框: 

具体的设置内容如下: 

Model/remodel 区域 

-active region:序列框内的活性区域 

-Chimera selection region:Chimera中选择的区域 

-non-terminal missing structure: 非端点的缺失结构,会将坐标文件和SEQRES相互比较 

-all missing structure: 所有缺失片段 

Allow this many residues adjacent to missing regions to move (default 1) * 允许移动的残基(来适配缺失残基),建议就是默认值 

*Number of models to generate  生成的模型数,个人觉得1个就好,后期再优化,默认为5个 

Loop modeling protocol 

-standard(默认) 

-DOPE: 精度更高,花的时间更多,有可能没有结果或者比预期结果少 

-DOPE-HR:和DOPE类似,精度相对没有那么高 

Run modeller using 

-web service(默认):需要Modeller license key,学术用户可以输入MODELIRANJE 

-local installation:需要路径,我的是usr/local/mode9.19

        点击 OK会后台会运行,我运行了大约15分钟,运行完后模型会直接进入界面,然后保存即可。 若停止或者查看可以在Task Panel查看,即右下角图标