AMBER Prep file formats

from http://ambermd.org/doc/prep.html

1. PREP

Usage:

prep [-O] -i input -o output -p params

-O Overwrite output files if they exist.

_______________________________________________________________

NOTE: Leap replaces Prep, Link, Edit and Parm with a much
simpler, single program.

The purpose of this module is to add new residues to
the standard AMBER residue database, create new databases, or
to create new residues as individual LINK-readable files. It
is not necessary to run PREP if all residues needed for a simu-
lation are already present in the standard AMBER database,
described in the LINK documentation. A residue is the basic
molecular unit of the AMBER simulation package. It is typi-
cally an amino acid or nucleic acid unit, but could be a pros-
thetic group, a small molecule, or a single ion.

Tree Structure: The geometry of the residue is described
by a "tree" structure to enable the LINK module to success-
fully connect it to a larger structure. The atoms in a residue
are classified into five topological types: "Main", "Side",
"Branch", "3", "4", "5" "6" and "End" types. They are denoted
as M, S, B, 3 4 5 6 and E respectively.

Main atoms describe the principal "path" through the
residue, starting at the connection to the previous residue
and ending at the connection to the next residue. The LINK
module will connect the last main atom of a residue to the
first main atom of the next residue in the molecule. If
there is only one residue in a molecule, the main atoms are
typically the longest continuous non-intersecting chain. The
main type atoms can have 1, 2, 3, or 4 atoms connected to
them.

Any atom that is not a main atom is described by one of
the other topological types: "E", "S", "B", "3", "4", "5", or
"6". An "E" atom has only one connection to other atoms, thus
is a "dead end" for any branch from any other atom type. An
"S" atom must have a total of two connections to other atoms,
a "B" atom must have a total of three connections, and a "3"
atom actually has a total of four connections; the same applies
for "4" "5" and "6". The topological types described here can
only describe acyclic systems. In order to describe the topol-
ogy of cyclic systems, explicit loop closing bonds are speci-
fied using the LOOP command described below. Loop closing
bonds are not counted as connections when assigning M, E,
S, B, 3, 4, 5, 6 topological types. If an atom has more than
four connections, it is not defined in the present tree struc-
ture.

Dummy atoms: PREP requires that three dummy atoms precede
the actual atoms of the residue. These atoms are simply used
to define the space axes for the residue. The three dummy
atoms must be given the topological type "M", and they must be
assigned a force field atom type that defines them as dummy
atoms. The symbol "DU" is recommended to be consistent
with the standard database. It is necessary to have the
three initial dummy atoms whether internal or cartesian
coordinates are given as input.

It is important for the proper functioning of the EDIT
module that dummy atoms be left in the first residue of the
system, but that they be removed in any subsequent residues.
Therefore you should specify the "NOMIT" flag for any initial
residue, and the "OMIT" flag for all others. In typical use
of AMBER, peptide systems are either started with the acetyl
residue ACE, which carries the dummy atoms with it in the
standard database, or they are started with a charged N-
terminal residue, which also has dummy atoms. Likewise,
nucleic acid systems are generally started with the HB residue,
which also has dummy atoms. The other residues in the
database have had their dummy atoms stripped at the end of
PREP through the use of the "NOMIT" option.

In the examples below, topological types are assigned
and the atoms are numbered in correct tree structure order.
An actual PREP input file appears at the end of this docu-
ment.

Example 1:

M(1)--M(2) \
| | <--- 3 dummy atoms to define space axes
M(3) /
|
Res ---- M(4)-- M(5)-- M(7)-- M(11)-- Res
n-1 | | | n+1
^ | | |
| E(6) S(8) B(12)-- S(14)-- E(15)
first real | |
atom | |
E(10)-- S(9) E(13)

Example 2:

Note on Tree Ordering: The tree structure begins at the
first dummy atom, and traverses the main chain until a branch
point (node) is reached. That branch is traversed until its
end or until the next node is reached. When you come to a node
with more than one branch (topological type "B" or "3"), it
doesn't matter which branch is traversed first as long as you
return to the next higher node when an end is reached.

PREP input files for standard peptide and nucleic acid
residues are typically maintained in several large files for
generation of the standard database for the LINK module.
Note that it is not necessary to run the PREP module unless
non-standard residues are needed. Non-standard residue data
may be output as individual files or appended to the standard
database if desired.

The LINK module is currently dimensioned to handle a maxi-
mum of 150 atoms per residue.

Note that smaller, neutral residues are most appropriate
unless an infinite cutoff is desired, because the first atoms
in each residue are used in applying the cutoff. The larger the
residue, the more unbalanced the cutoff, i.e. the greater dif-
ference between head-to-head and tail-to-tail orientations.

This module was originally written by P. K. Weiner at UCSF
and overhauled by U. C. Singh in 1984. The data base structure
was completely modified. Prep was revised for Rev A by George
Seibel in 1989.

Input description: This section describes the residue(s) input
file which is read through unit 5. The input is free format
and it is assumed that the different fields are separated by at
least one space (including character fields). The character
variables are always left justified. If a character field
contains more than four characters the rest are ignored. If it
contains less extra blanks are added to it. Since blanks are
separators between fields signs have to immediately precede
numbers.

------------------------------------------------------------------------

- 1 - CONTROL FOR DATA BASE GENERATION

The data base is a direct access file containing the
standard residues and a directory of their names. It
is named DB4.DAT in the version 4 AMBER distribution,
and is found in the DAT directory. The LINK module
will search this file for a residue before searching
the external files for it. The LINK module can only
access one database per run. Thus if any user supplied
residues are needed, they can be accessed by LINK as
individual files. The data base can also be appended
with user supplied residues if desired.

IDBGEN , IREST , ITYPF

FORMAT(3I)

IDBGEN Flag for data base generation
= 0 No database generation. Output will be individual files.
This is the standard procedure if you want to create a
single small molecule.
= 1 A new data base will be generated or the existing database
will be appended.

IREST Flag for the type of generation (assuming IDBGEN = 1)
= 0 New data base
= 1 Appending an existing data base

ITYPF Force field type code (used in LINK stage)
Ignored if IDBGEN = 0 The following codes are used in
the standard database:
= 1 United atom model
= 2 All atom model
= 100 United atom charged N-terminal amino acid residues
= 101 United atom charged C-terminal amino acid residues
= 200 All atom charged N-terminal amino acid residues
= 201 All atom charged C-terminal amino acid residues

Note: This variable allows you to have several different
models for the same residue name stored in one database.
These models could differ in topology, charge, or other
factors. The charged terminal residues are selected
internally by LINK if the IFTPRO flag is set. The
database can hold up to 510 residues.

------------------------------------------------------------------------

- 2 - NAMDBF

FORMAT(A80)

NAMDBF Name of the data base file (maximum 80 characters)
if NOT data base generation leave a BLANK CARD

------------------------------------------------------------------------

- 3 - TITLE

FORMAT(20A4)

TITLE Descriptive header for the residue

------------------------------------------------------------------------

- 4 - NAMF

FORMAT(A80)

NAMF Name of the output file if an individual residue file is
being generated. If database is being generated or
appended this card IS read but ignored.

------------------------------------------------------------------------

- 5 - NAMRES , INTX , KFORM

FORMAT(2A,I)

NAMRES A unique name for the residue of maximum 4 characters

INTX Flag for the type of coordinates to be saved for the
LINK module
'INT' internal coordinates will be output (preferable)
'XYZ' cartesian coordinates will be output

KFORM Format of output for individual residue files
= 0 formatted output (recommended for debugging)
= 1 binary output

------------------------------------------------------------------------

- 6 - IFIXC , IOMIT , ISYMDU , IPOS

FORMAT(4A)

IFIXC Flag for the type of input geometry of the residue(s)

'CORRECT' The geometry is input as internal coordinates with
correct order according to the tree structure.
NOTE: the tree structure types ('M', 'S', etc) and order
must be defined correctly: NA(I), NB(I), and NC(I) on card
8 are always ignored.
'CHANGE' It is input as cartesian coordinates or part cartesian
and part internal. Cartesians should precede internals
to ensure that the resulting coordinates are correct.
Coordinates need not be in correct order, since each
is labeled with its atom number. NOTE: NA(I), NB(I), and
NC(I) on card 8 must be omitted for cartesian coordinates
with this option.

IOMIT Flag for the omission of dummy atoms

'OMIT' dummy atoms will be deleted after generating all the
information (this is used for all but the first residue
in the system)
'NOMIT' they will not be deleted (dummy atoms are retained for
the first residue of the system. others are omitted)

ISYMDU Symbol for the dummy atoms. The symbol must be
be unique. It is preferable to use 'DU' for it

IPOS Flag for the position of dummy atoms to be deleted

'ALL' all the dummy atoms will be deleted
'BEG' only the beginning dummy atoms will be deleted

------------------------------------------------------------------------

- 7 - CUT

FORMAT(F)

CUT The cutoff distance for loop closing bonds which
cannot be defined by the tree structure. Any pair of
atoms within this distance is assumed to be bonded.
We recommend that CUT be set to 0.0 and explicit loop
closing bonds be defined below.

------------------------------------------------------------------------

- 8 - I , IGRAPH(I) , ISYMBL(I) , ITREE(I) , NA(I) , NB(I) ,
NC(I) , R(I) , THETA(I) , PHI(I) , CHG(I) , I = 1, NATOM

FORMAT(I,3A,3I,4F)

I The actual number of the atom in the tree.

If IFIXC .eq. 'CHANGE' then this number is important
since the corresponding coordinates are stored at that
location. If IFIXC .eq. 'CORRECT' then atoms are in
the correct order according to the tree structure.

NOTE: PREP always expects three dummy atoms for the beginning.

IGRAPH(I) A unique atom name for the atom I. If coordinates are
read in at the EDIT stage, this name will be used for
matching atoms. Maximum 4 characters.

ISYMBL(I) A symbol for the atom I which defines its force field
atom type and is used in the module PARM for assigning
the force field parameters.

ITREE(I) The topological type (tree symbol) for atom I
(M, S, B, E, or 3)

NA(I) The atom number to which atom I is connected.
Read but ignored for internal coordinates; If cartesian
coordinates are used, this must be omitted.

NB(I) The atom number to which atom I makes an angle along
with NA(I).
Read but ignored for internal coordinates; If cartesian
coordinates are used, this must be omitted.

NC(I) The atom number to which atom I makes a dihedral along
with NA(I) and NB(I).
Read but ignored for internal coordinates; If cartesian
coordinates are used, this must be omitted.

R(I) If IFIXC .eq. 'CORRECT' then this is the bond length
between atoms I and NA(I)
If IFIXC .eq. 'CHANGE' then this is the X coordinate
of atom I

THETA(I) If IFIXC .eq. 'CORRECT' then it is the bond angle
between atom NB(I), NA(I) and I
If IFIXC .eq. 'CHANGE' then it is the Y coordinate of
atom I

PHI(I) If IFIXC .eq. 'CORRECT' then it is the dihedral angle
between NC(I), NB(I), NA(I) and I
If IFIXC .eq. 'CHANGE' then it is the Z coordinate of
atom I

CHRG(I) The partial atomic charge on atom I

This section is terminated by one BLANK CARD if IFIXC = 'CORRECT'.
This section is terminated by TWO BLANK CARDS if IFIXC = 'CHANGE'.

------------------------------------------------------------------------

- 9 - IOPR

FORMAT(A4)

IOPR Flag to read additional information about the residue.
There are four options available. The order in which
they are specified is not important. Format is keyword
on its own line, followed by data on succeeding lines,
terminated by a BLANK CARD.

'CHARGE' Control to read additional partial atomic charges.
These will override charges specified above in section 8.
The charges are read in format(5F) for the non-dummy
atoms. A BLANK CARD terminates this section. It is
less error-prone to specify charges as in section 8.

'LOOP' Control to read explicit loop closing bonds (in
addition to the loops generated based on the cutoff
criterion). If this option is used it is preferable
to set the cutoff criterion to zero. The loop closing
atoms are read in format(2A) as their atom (IGRAPH) names.
A BLANK CARD terminates this section.

'IMPROPER' Control for reading the improper torsion angles. A
proper torsion I - J - K - L has I bonded to J bonded
to K bonded to L. An IMPROPER torsion is any torsion in
which this is not the case. Improper torsions are used to
keep the asymmetric centers from racemizing in the united
atom model where all the C-H hydrogens are omitted. They
can also be used to enforce planarity. The normal case is:

J
|
K
/ \
I L

Improper I-J-K-L

where the central atom (K) is the third atom in the improper
and the order of the other three is determined alphabetically
by atom type and if types are the same by atom number.
The improper torsions should be defined in such a way that
the proper torsions are not duplicated. The atoms making the
improper torsions are read as their atom (IGRAPH) names.
'-M' can be used in place of an atom name to indicate the
last main chain atom in the previous residue, and '+M' for
the first main chain atom in the next residue. NOTE: -M and
+M cannot be used in the 4th position ('L') owing to internal
data representation limitations. A BLANK CARD terminates
this section.

'DONE' Control to exit from this section.

NOTE: If extra blank cards are found between different options they
are ignored. Control will exit only when the 'DONE' option is
found. If it is desired to process another residue place the
appropriate information after the 'DONE' card.

------------------------------------------------------------------------

-10 - KSTOP

FORMAT(A4)

KSTOP Control to exit from the program

'STOP' Exit from the program. It has to be placed immediately
following the 'DONE' card.

The program can never make a graceful exit if this card
is missing since it is working inside an infinite loop.

_______________________________________________________________

Example input for phenylalanine (united atom)

Res n-1 O
\ /
N----CA----C
/ | \
HN CB Res n+1
|
CG
/ \
CD1 CD2
| |
CE1 CE2
\ /
CZ

0 0 1PHENYLALANINE PREP INPUT EXAMPLE (title)
PHE
PHE INT 1
CORRECT OMIT DU BEG
0.0
1 DUMM DU M 0 -1 -2 0.0000 0.0000 0.0000 0.000
2 DUMM DU M 1 0 -1 1.4490 0.0000 0.0000 0.000
3 DUMM DU M 2 1 0 1.5220 111.1000 0.0000 0.000
4 N N M 3 2 1 1.3350 116.6000 180.0000 -0.5200
5 HN H E 4 3 2 1.0100 119.8000 0.0000 0.2480
6 CA CH M 4 3 2 1.4490 121.9000 180.0000 0.2140
7 CB C2 S 6 4 3 1.5250 111.1000 60.0000 0.0380
8 CG CA S 7 6 4 1.5100 115.0000 180.0000 0.0110
9 CD1 CD S 8 7 6 1.4000 120.0000 180.0000 -0.0110
10 CE1 CD S 9 8 7 1.4000 120.0000 180.0000 0.0040
11 CZ CD S 10 9 8 1.4000 120.0000 0.0000 -0.0030
12 CE2 CD S 11 10 9 1.4000 120.0000 0.0000 0.0040
13 CD2 CD E 12 11 10 1.4000 120.0000 0.0000 -0.0110
14 C C M 6 4 3 1.5220 111.1000 180.0000 0.5260
15 O O E 14 6 4 1.2290 120.5000 0.0000 -0.5000

IMPROPER
-M CA N HN
CA +M C O
CB CA N C

LOOP
CG CD2

DONE
STOP