421223
当前位置: 首页   >  课题组新闻   >  amber prepin file format
amber prepin file format
发布时间:2019-09-21

AMBER Prep file formats

    from http://ambermd.org/doc/prep.html
    1.  PREP
    
    
    Usage:
    
          prep [-O] -i input -o output -p params
    
    
    -O   Overwrite output files if they exist.
    
    _______________________________________________________________
    
    
         NOTE: Leap replaces Prep, Link, Edit and Parm with a  much
    simpler, single program.
    
         The  purpose  of  this module is to add  new  residues  to
    the  standard  AMBER residue database, create new databases, or
    to  create  new residues as individual LINK-readable files.  It
    is not necessary to run PREP if all residues needed for a simu-
    lation  are  already  present in the standard  AMBER  database,
    described in the LINK documentation.  A residue  is  the  basic
    molecular  unit  of  the AMBER simulation package.  It is typi-
    cally an amino acid or nucleic acid unit, but could be a  pros-
    thetic group, a small molecule, or a single ion.
    
         Tree  Structure:  The geometry of the residue is described
    by a "tree" structure  to enable the LINK  module  to  success-
    fully connect it to a larger structure.  The atoms in a residue
    are  classified into five topological types:   "Main",  "Side",
    "Branch",  "3", "4", "5" "6" and "End" types.  They are denoted
    as M, S, B, 3 4 5 6 and E respectively.
    
         Main atoms describe the principal   "path"   through   the
    residue,   starting  at  the connection to the previous residue
    and ending at the connection to the next  residue.   The   LINK
    module   will   connect  the last main atom of a residue to the
    first main atom of the next residue  in   the   molecule.    If
    there   is  only  one residue in a molecule, the main atoms are
    typically the longest continuous  non-intersecting  chain.  The
    main   type  atoms   can have  1, 2, 3, or 4 atoms connected to
    them.
    
         Any atom that is not a main atom is described  by  one  of
    the   other topological types: "E", "S", "B", "3", "4", "5", or
    "6".  An "E" atom has only one connection to other atoms,  thus
    is  a "dead end"  for  any branch from any other atom type.  An
    "S" atom must have a total of two connections to  other  atoms,
    a   "B" atom  must have a total of three connections, and a "3"
    atom actually has a total of four connections; the same applies
    for  "4" "5" and "6".  The topological types described here can
    only describe acyclic systems.  In order to describe the topol-
    ogy  of  cyclic systems, explicit loop closing bonds are speci-
    fied using the LOOP  command  described  below.   Loop  closing
    bonds  are  not  counted as connections when  assigning  M,  E,
    S, B, 3, 4, 5, 6 topological types.  If an atom has  more  than
    four  connections, it is not defined in the present tree struc-
    ture.
    
         Dummy atoms: PREP requires that three dummy atoms  precede
    the  actual  atoms of the residue.  These atoms are simply used
    to define the space axes for  the  residue.   The  three  dummy
    atoms  must be given the topological type "M", and they must be
    assigned a force field atom type that defines them   as   dummy
    atoms.   The   symbol   "DU"   is  recommended to be consistent
    with the standard database.   It  is  necessary  to  have   the
    three  initial   dummy   atoms   whether  internal or cartesian
    coordinates are given as input.
    
         It is important for the proper functioning  of  the   EDIT
    module   that  dummy  atoms be left in the first residue of the
    system, but that they be removed in  any  subsequent  residues.
    Therefore  you  should specify the "NOMIT" flag for any initial
    residue, and the "OMIT" flag for all others.  In  typical   use
    of   AMBER,  peptide systems are either started with the acetyl
    residue ACE, which carries the dummy  atoms   with  it  in  the
    standard  database,  or  they  are  started  with  a charged N-
    terminal  residue,  which  also  has  dummy  atoms.   Likewise,
    nucleic acid systems are generally started with the HB residue,
    which also has dummy  atoms.    The  other   residues  in   the
    database   have  had  their  dummy atoms stripped at the end of
    PREP through the use of the "NOMIT" option.
    
         In the examples below, topological  types   are   assigned
    and   the  atoms  are numbered in correct tree structure order.
    An actual PREP input file appears at the end  of   this   docu-
    ment.
    
    
    Example 1:
    
    
    
                M(1)--M(2)  \
                      |      | <--- 3 dummy atoms to define space axes
                      M(3)  /
                      |
             Res ---- M(4)-- M(5)-- M(7)-- M(11)-- Res
              n-1            |      |      |        n+1
                      ^      |      |      |
                      |      E(6)   S(8)   B(12)-- S(14)-- E(15)
                first real          |      |
                   atom             |      |
                            E(10)-- S(9)   E(13)
    
    
    Example 2:
    
    
                M(1)--M(2)  \
                      |      | <--- 3 dummy atoms to define space axes
                      M(3)  /
                      |
             Res ---- M(4)-- M(6)-- M(13)-- M(23)--- Res
              n-1     |      |      |       |         n+1
                      |      |      |       |
                      E(5)   S(7)   S(14)   E(24)
                             |      |
                             |      |
              E(10)-- B(9)-- S(8)   S(15)   E(18)
                      |             |       |
                      |             |       |
                      S(11)         S(16)-- 3(17)-- S(21)-- E(22)
                      |                     |
                      |                     |
                      E(12)                 S(19)-- E(20)
    
    
         Note  on  Tree  Ordering: The tree structure begins at the
    first dummy atom, and traverses the main chain until  a  branch
    point  (node)  is  reached.  That branch is traversed until its
    end or until the next node is reached.  When you come to a node
    with  more  than  one  branch (topological type "B" or "3"), it
    doesn't matter which branch is traversed first as long  as  you
    return to the next higher node when an end is reached.
    
         PREP  input  files  for  standard peptide and nucleic acid
    residues are typically maintained in several  large  files  for
    generation  of  the  standard database  for  the  LINK  module.
    Note  that it is not necessary to run the  PREP  module  unless
    non-standard  residues  are  needed.  Non-standard residue data
    may be output as individual files or appended to  the  standard
    database if desired.
    
         The LINK module is currently dimensioned to handle a maxi-
    mum of 150 atoms per residue.
    
         Note that smaller, neutral residues are  most  appropriate
    unless  an  infinite cutoff is desired, because the first atoms
    in each residue are used in applying the cutoff. The larger the
    residue,  the more unbalanced the cutoff, i.e. the greater dif-
    ference between head-to-head and tail-to-tail orientations.
    
         This module was originally written by P. K. Weiner at UCSF
    and  overhauled by U. C. Singh in 1984. The data base structure
    was completely modified.  Prep was revised for Rev A by  George
    Seibel in 1989.
    
    Input  description: This section describes the residue(s) input
    file  which is  read through unit 5.  The input is free  format
    and it is assumed that the different fields are separated by at
    least one space (including character  fields).   The  character
    variables  are  always  left  justified.   If a character field
    contains more than four characters the rest are ignored. If  it
    contains less extra blanks are added to  it.   Since blanks are
    separators between fields signs  have  to  immediately  precede
    numbers.
    
    
         ------------------------------------------------------------------------
    
            - 1 -       CONTROL FOR DATA BASE GENERATION
    
                  The data base is a direct access file containing the
                  standard residues and a directory of their names.  It
                  is named DB4.DAT in the version 4 AMBER distribution,
                  and is found in the DAT directory.  The LINK module
                  will search this file for a residue before searching
                  the external files for it.  The LINK module can only
                  access one database per run.  Thus if any user supplied
                  residues are needed, they can be accessed by LINK as
                  individual files.  The data base can also be appended
                  with user supplied residues if desired.
    
                  IDBGEN , IREST , ITYPF
    
                      FORMAT(3I)
    
            IDBGEN      Flag for data base generation
             = 0  No database generation.  Output will be individual files.
                  This is the standard procedure if you want to create a
                  single small molecule.
             = 1  A new data base will be generated or the existing database
                  will be appended.
    
            IREST       Flag for the type of generation (assuming IDBGEN = 1)
             = 0  New data base
             = 1  Appending an existing data base
    
            ITYPF       Force field type code (used in LINK stage)
                  Ignored if IDBGEN = 0   The following codes are used in
                  the standard database:
             = 1  United atom model
             = 2  All atom model
           = 100  United atom charged N-terminal amino acid residues
           = 101  United atom charged C-terminal amino acid residues
           = 200  All atom charged N-terminal amino acid residues
           = 201  All atom charged C-terminal amino acid residues
    
                  Note:  This variable allows you to have several different
                  models for the same residue name stored in one database.
                  These models could differ in topology, charge, or other
                  factors.  The charged terminal residues are selected
                  internally by LINK if the IFTPRO flag is set.  The
                  database can hold up to 510 residues.
    
         ------------------------------------------------------------------------
    
            - 2 -       NAMDBF
    
                      FORMAT(A80)
    
            NAMDBF      Name of the data base file (maximum 80 characters)
                  if NOT data base generation leave a BLANK CARD
    
         ------------------------------------------------------------------------
    
            - 3 -      TITLE
    
                      FORMAT(20A4)
    
            TITLE      Descriptive header for the residue
    
         ------------------------------------------------------------------------
    
            - 4 -      NAMF
    
                     FORMAT(A80)
    
            NAMF       Name of the output file if an individual residue file is
                 being generated.  If database is being generated or
                 appended this card IS read but ignored.
    
         ------------------------------------------------------------------------
    
            - 5 -      NAMRES , INTX , KFORM
    
                     FORMAT(2A,I)
    
            NAMRES     A unique name for the residue of maximum 4 characters
    
            INTX       Flag for the type of coordinates to be saved for the
                 LINK module
          'INT'  internal coordinates will be output (preferable)
          'XYZ'  cartesian coordinates will be output
    
            KFORM      Format of output for individual residue files
            = 0  formatted output (recommended for debugging)
            = 1  binary output
    
         ------------------------------------------------------------------------
    
            - 6 -      IFIXC , IOMIT , ISYMDU , IPOS
    
                     FORMAT(4A)
    
            IFIXC      Flag for the type of input geometry of the residue(s)
    
          'CORRECT' The geometry is input as internal coordinates with
                    correct order according to the tree structure.
                    NOTE: the tree structure types ('M', 'S', etc) and order
                    must be defined correctly: NA(I), NB(I), and NC(I) on card
                    8 are always ignored.
          'CHANGE'  It is input as cartesian coordinates or part cartesian
                    and part internal.  Cartesians should precede internals
                    to ensure that the resulting coordinates are correct.
                    Coordinates need not be in correct order, since each
                    is labeled with its atom number. NOTE: NA(I), NB(I), and
                    NC(I) on card 8 must be omitted for cartesian coordinates
                    with this option.
    
            IOMIT      Flag for the omission of dummy atoms
    
          'OMIT'    dummy atoms will be deleted after generating all the
                    information (this is used for all but the first residue
                    in the system)
          'NOMIT'   they will not be deleted (dummy atoms are retained for
                    the first residue of the system.  others are omitted)
    
            ISYMDU     Symbol for the dummy atoms.  The symbol must be
                 be unique.  It is preferable to use 'DU' for it
    
            IPOS       Flag for the position of dummy atoms to be deleted
    
          'ALL'     all the dummy atoms will be deleted
          'BEG'     only the beginning dummy atoms will be deleted
    
         ------------------------------------------------------------------------
    
            - 7 -      CUT
    
                     FORMAT(F)
    
            CUT        The cutoff distance for loop closing bonds which
                 cannot be defined by the tree structure.  Any pair of
                 atoms within this distance is assumed to be bonded.
                 We recommend that CUT be set to 0.0 and explicit loop
                 closing bonds be defined below.
    
         ------------------------------------------------------------------------
    
            - 8 -      I , IGRAPH(I) , ISYMBL(I) , ITREE(I) , NA(I) , NB(I) ,
                 NC(I) , R(I) , THETA(I) , PHI(I) , CHG(I) , I = 1, NATOM
    
                     FORMAT(I,3A,3I,4F)
    
            I          The actual number of the atom in the tree.
    
                 If IFIXC .eq. 'CHANGE' then this number is important
                 since the corresponding coordinates are stored at that
                 location.  If IFIXC .eq. 'CORRECT' then atoms are in
                 the correct order according to the tree structure.
    
            NOTE:  PREP always expects three dummy atoms for the beginning.
    
            IGRAPH(I)  A unique atom name for the atom I. If coordinates are
                 read in at the EDIT stage, this name will be used for
                 matching atoms.  Maximum 4 characters.
    
            ISYMBL(I)  A symbol for the atom I which defines its force field
                 atom type and is used in the module PARM for assigning
                 the force field parameters.
    
            ITREE(I)   The topological type (tree symbol) for atom I
                 (M, S, B, E, or 3)
    
            NA(I)      The atom number to which atom I is connected.
                 Read but ignored for internal coordinates; If cartesian
                 coordinates are used, this must be omitted.
    
            NB(I)      The atom number to which atom I makes an angle along
                 with NA(I).
                 Read but ignored for internal coordinates; If cartesian
                 coordinates are used, this must be omitted.
    
            NC(I)      The atom number to which atom I makes a dihedral along
                 with NA(I) and NB(I).
                 Read but ignored for internal coordinates; If cartesian
                 coordinates are used, this must be omitted.
    
            R(I)       If IFIXC .eq. 'CORRECT' then this is the bond length
                 between atoms I and NA(I)
                 If IFIXC .eq. 'CHANGE' then this is the X coordinate
                 of atom I
    
            THETA(I)   If IFIXC .eq. 'CORRECT' then it is the bond angle
                 between atom NB(I), NA(I) and I
                 If IFIXC .eq. 'CHANGE' then it is the Y coordinate of
                 atom I
    
            PHI(I)     If IFIXC .eq. 'CORRECT' then it is the dihedral angle
                 between NC(I), NB(I), NA(I) and I
                 If IFIXC .eq. 'CHANGE' then it is the Z coordinate of
                 atom I
    
            CHRG(I)    The partial atomic charge on atom I
    
            This section is terminated by one BLANK CARD if IFIXC = 'CORRECT'.
            This section is terminated by TWO BLANK CARDS if IFIXC = 'CHANGE'.
    
         ------------------------------------------------------------------------
    
            - 9 -      IOPR
    
                     FORMAT(A4)
    
            IOPR       Flag to read additional information about the residue.
                 There are four options available.  The order in which
                 they are specified is not important.  Format is keyword
                 on its own line, followed by data on succeeding lines,
                 terminated by a BLANK CARD.
    
             'CHARGE'  Control to read additional partial atomic charges.
                 These will override charges specified above in section 8.
                 The charges are read in format(5F) for the non-dummy
                 atoms.  A BLANK CARD terminates this section.   It is
                 less error-prone to specify charges as in section 8.
    
              'LOOP'   Control to read explicit loop closing bonds (in
                 addition to the loops generated based on the cutoff
                 criterion).  If this option is used it is preferable
                 to set the cutoff criterion to zero.  The loop closing
                 atoms are read in format(2A) as their atom (IGRAPH) names.
                 A BLANK CARD terminates this section.
    
           'IMPROPER'  Control for reading the improper torsion angles.  A
                 proper torsion I - J - K - L has I bonded to J bonded
                 to K bonded to L.  An IMPROPER torsion is any torsion in
                 which this is not the case.  Improper torsions are used to
                 keep the asymmetric centers from racemizing in the united
                 atom model where all the C-H hydrogens are omitted.  They
                 can also be used to enforce planarity.  The normal case is:
    
                                    J
                                    |
                                    K
                                   / \
                                  I   L
    
                             Improper I-J-K-L
    
                 where the central atom (K) is the third atom in the improper
                 and the order of the other three is determined alphabetically
                 by atom type and if types are the same by atom number.
                 The improper torsions should be defined in such a way that
                 the proper torsions are not duplicated.  The atoms making the
                 improper torsions are read as their atom (IGRAPH) names.
                 '-M' can be used in place of an atom name to indicate the
                 last main chain atom in the previous residue, and '+M' for
                 the first main chain atom in the next residue. NOTE: -M and
                 +M cannot be used in the 4th position ('L') owing to internal
                 data representation limitations.  A BLANK CARD terminates
                 this section.
    
              'DONE'   Control to exit from this section.
    
    
           NOTE: If extra blank cards are found between different options they
           are ignored.  Control will exit only when the 'DONE' option is
           found.  If it is desired to process another residue place the
           appropriate information after the 'DONE' card.
    
         ------------------------------------------------------------------------
    
            -10 -      KSTOP
    
                     FORMAT(A4)
    
            KSTOP      Control to exit from the program
    
          'STOP' Exit from the program.  It has to be placed immediately
                 following the 'DONE' card.
    
                 The program can never make a graceful exit if this card
                 is missing since it is working inside an infinite loop.
    
    
    _______________________________________________________________
    
             Example input for phenylalanine (united atom)
    
    
    
                              Res n-1          O
                                 \            /
                                  N----CA----C
                                 /     |      \
                               HN      CB     Res n+1
                                       |
                                       CG
                                     /    \
                                   CD1    CD2
                                    |      |
                                   CE1    CE2
                                     \    /
                                       CZ
    
    
    
    
             0    0    1PHENYLALANINE PREP INPUT EXAMPLE (title)
         PHE
         PHE  INT    1
         CORRECT  OMIT DU   BEG
           0.0
             1 DUMM   DU    M    0   -1   -2   0.0000    0.0000    0.0000  0.000
             2 DUMM   DU    M    1    0   -1   1.4490    0.0000    0.0000  0.000
             3 DUMM   DU    M    2    1    0   1.5220  111.1000    0.0000  0.000
             4 N      N     M    3    2    1   1.3350  116.6000  180.0000 -0.5200
             5 HN     H     E    4    3    2   1.0100  119.8000    0.0000  0.2480
             6 CA     CH    M    4    3    2   1.4490  121.9000  180.0000  0.2140
             7 CB     C2    S    6    4    3   1.5250  111.1000   60.0000  0.0380
             8 CG     CA    S    7    6    4   1.5100  115.0000  180.0000  0.0110
             9 CD1    CD    S    8    7    6   1.4000  120.0000  180.0000 -0.0110
            10 CE1    CD    S    9    8    7   1.4000  120.0000  180.0000  0.0040
            11 CZ     CD    S   10    9    8   1.4000  120.0000    0.0000 -0.0030
            12 CE2    CD    S   11   10    9   1.4000  120.0000    0.0000  0.0040
            13 CD2    CD    E   12   11   10   1.4000  120.0000    0.0000 -0.0110
            14 C      C     M    6    4    3   1.5220  111.1000  180.0000  0.5260
            15 O      O     E   14    6    4   1.2290  120.5000    0.0000 -0.5000
    
         IMPROPER
         -M  CA  N   HN
         CA  +M  C   O
         CB  CA  N   C
    
         LOOP
         CG  CD2
    
         DONE
         STOP