.
This file contains the singleton energies ("self energies") and pairwise rotamer energies for a protein
structure that is to be used in the multistate protein design procedure
performed by SPRINT.
The rotamer energies file is in the FastInf format for probabilistic graphical models (example FastInf format).
There are two alternative options for creating energies files for use with SPRINT (and fastInf):
- The first one is to use an input file with energies in it (in the format detailed below).
- The second option is to create an energies file inside your program. For an example on how that is done, see
modelTest.cpp of the fastInf package. In any case, you must ensure that the format of all output conforms to the descriptions below.
- In either case, the file basically contains a standard factor graph representation for a graphical model describing the interactions of a protein structure. See the example depicted below.
The energies file consists of 4 relevant sections:
-
Variables
This section contains a list of the positions to be
designed and the total number of rotamers at the respective position.
The
position consists of the chain ID and the residue number, where the chain
MUST be delimited on BOTH sides by an underscore ("_"). For
example, the design position at chain A residue number 165 will be denoted by
"_A_165".
Thus, the following line denotes that chain A residue number 165 has
100 rotamers in total (for all of its designed amino acids):
_A_165 100
Note that the residue name can also contain any optional information prepended to the beginning, e.g., "design_A_165".
-
Cliques
This section contains a list of the subsets for which singleton and pairwise
energies exist. Each line consists of 5 fields:
- A name for the subset of positions (can be anything).
- The number of positions in the set.
- The indices of the positions in the set, where indices start from 0 and refer to the order in which the positions were listed in the "Variables" section above.
- The number of "neighbors" of this set (i.e., for singleton sets, this is the number of pairwise edges in which it partakes; and for pairwise sets, this is 2).
- The indices of the "neighbors" of this set, where indices start from 0 and follow the order in which the sets are given in this section ("Cliques").
Thus, the following lines indicate that set 0 contains a single variable (100),
which is connected to one other set (2). Set 1 also contains a single variable
(200), and is connected to one other set (2). Set 2 consists of variables 100
and 200 and is connected to 2 sets (sets 0 and 1):
cliq0 1 100 1 2
cliq1 1 200 1 2
cliq2 2 100 200 2 0 1
NOTE: In the "Cliques" section, the 5 fields MUST be separated by TABS (the '\t' character), as in the examples shown here.
-
Measures
This section consists of the rotamer energies calculated for each position and
each pair of positions in contact. Each line consists of one such "matrix" of
rotamer energies, and must contain the following 4 fields:
- A name for the energies matrix (can be anything).
- The number of
positions which the energies describe (1 for singleton energies, 2 for pairwise
rotamer-rotamer energies).
- For each of the positions that this
matrix describes, the total number of rotamers at each such position.
- The actual rotamer energies for this matrix, ordered where the assignment
advances like a binary number counter, i.e., 00 01 10 11.
Thus, the following two lines indicate that energy matrix 0 contains a single
variable with 3 rotamers, and matrix 1 contains two variable with 3 rotamers
and 2 rotamers, respectively:
matrix0 1 3 -3.5328 -4.5901 -3.6388
matrix1 2 3 2 0.091058 0.066163 -0.066163 -0.91101 -0.39133 0.066163
NOTE: In the "Measures" section, the 4 fields MUST be separated by TABS (the '\t' character), as in the examples shown here.
-
CliqueToMeasure
This section maps between indices of subsets of
positions ("Cliques") to their respective energy matrices ("Measures"). For
example, the following lines indicate that set 0 is mapped to matrix 0, set 1
to matrix 1, etc.:
0 0
1 1
2 2
Each section is terminated by a line containing "@End", and the title of each
section starts with "@" as well (e.g., "@Variables").
NOTE: In the "Cliques" and "Measures" sections, the fields MUST be separated by TABS (the '\t' character), as in the examples shown here.
A simple valid example is as follows (download it):
@Variables
_A_165 2
_A_200 3
_B_10 2
@End
@Cliques
cliq0 1 0 1 3
cliq1 1 1 2 3 4
cliq2 1 2 1 4
cliq3 2 0 1 2 0 1
cliq4 2 1 2 2 1 2
@End
@Measures
matrix0 1 2 -3 -4
matrix1 1 3 -1 -4 -6
matrix2 1 2 -4 3
matrix3 2 2 3 0 1 -1 2 -3 -4
matrix4 2 3 2 1 2 0 -1 -5 -2
@End
@CliqueToMeasure
0 0
1 1
2 2
3 3
4 4
@End
This example corresponds to the following energies between the rotamers of the protein positions: