|Rosetta 3.2 Release Manual|
matchapplications are maintained by David Baker's lab. Send questions to email@example.com
mini/test/integration/tests/match/inputs/6cpa/6cpa_xtal.cst) and enzdes (
CST::BEGIN TEMPLATE:: ATOM_MAP: 1 atom_name: C6 O4 O2 TEMPLATE:: ATOM_MAP: 1 residue3: D2N TEMPLATE:: ATOM_MAP: 2 atom_type: Nhis, TEMPLATE:: ATOM_MAP: 2 residue1: H CONSTRAINT:: distanceAB: 2.00 0.30 100.00 1 0 CONSTRAINT:: angle_A: 105.10 6.00 100.00 360.00 1 CONSTRAINT:: angle_B: 116.90 5.00 50.00 360.00 1 CONSTRAINT:: torsion_A: 105.00 10.00 50.00 360.00 2 CONSTRAINT:: torsion_B: 180.00 10.00 25.00 180.00 4 CONSTRAINT:: torsion_AB: 0.00 45.00 0.00 180.00 5 CST::END
The information in this block defines constraints between three atoms on residue 1 and three atoms on residue 2. Up to six parameters can be specified ( one distance, two angles, 3 dihedrals ).
The Records indicate the following:
'CSTBEGIN' and 'CSTEND' indicate the beginning and end of the respective definition block for this catalytic interaction.
The 'TEMPLATE:: ATOM_MAP:' records:
These indicate what atoms are constrained and what type of residue they are in. The number in column 3 of these records indicates which catalytic residue the record relates to. It has to be either 1 or 2.
The 'atom_name' tag specifies exactly which 3 atoms of the residue are to be constrained. It has to be followed by the names of three atoms that are part of the catalytic residue. In the above example, for catalytic residue 1, atom 1 is C6, atom 2 is O4, and atom3 is O2. The 'atom_type' tag is an alternative to the 'atom_name' tag. It allows more flexible definition of the constrained atoms. It has to be followed by the Rosetta atom type of the first constrained atom of the residue. In case this tag is used, Rosetta will set the 2nd constrained atom as the base atom of the first constrained atom and the third constrained atom as the base atom of the 2nd constrained atom. ( Note: the base atoms for each atom are defined in the ICOOR records of the .params file for that residue type ). There are two advantages to using the 'atom_type' tag: first, it allows constraining different residue types with the same file. For example if a catalytic hydrogen bond is to be constrained, but the user doesn't care if it's mediated by a SER-OH or a THR-OH. Second, if a catalytic residue contains more than one atom of the same type (as in the case of ASP or GLU ), but it doesn't matter which of these atoms mediates the constrained interaction, using this tag will cause Rosetta to evaluate the constraint for all of these atoms separately and pick the one with lowest score, i.e. the ambiguity of the constraint will automatically be resolved.
The 'residue1' or 'residue3' tag specifies what type of residue is constrained. 'residue3' needs to be followed by the name of the residue in 3 letter abbrevation. 'residue1' needs to be followed by the name of the residue in 1 letter abbrevation. As a convenience, if several similar residue types can fulfill the constraint (i.e. ASP or GLU ), the 'residue1' tag can be followed by a string of 1-letter codes of the allowed residues ( i.e. ED for ASP/GLU, or ST for SER/THR ).
The 'CONSTRAINT::' records:
These records specify the actualy value and strength of the constraint applied between the two residues specified in the block. Each of these records is followed by one string and 4 numbers. The string can have the following allowed values: 'distanceAB' means the distance Res1:Atom1 = Res2:Atom1, i.e. the distance between atom1 of residue 1 and atom1 of residue 2. 'angle_A' is the angle Res1:Atom2 - Res1:Atom1 - Res2:Atom1 'angle_B' is the angle Res1:Atom1 - Res2:Atom1 - Res2:Atom2 'torsion_A' is the dihedral Res1:Atom3 - Res1:Atom2 - Res1:Atom1 - Res2:Atom1 'torsion_AB' is the dihedral Res1:Atom2 - Res1:Atom1 - Res2:Atom1 - Res2:Atom2 'torsion_B' is the dihedral Res1:Atom1 - Res2:Atom1 - Res2:Atom2 - Res2:Atom3
Each of these strings is followed by 4 (optionally 5 ) columns of numbers: x0, xtol, k, covalent/periodicity, and number of samples. The 1st column, x0, specifies the optimum distance x0 for the respective value. The 2nd, xtol, column specifies the allowed tolerance xtol of the value. The 3rd column specifies the force constant k, or the strength of this particular parameter. If x is the value of the constrained parameter, the score penalty applied will be: 0 if |x - x0| < xtol and k * ( |x - x0| - xtol ) otherwise
This 3rd column is only relevant for enzdes, and the number in it is not used by the matcher.
The 4th column has a special meaning in case of the distanceAB parameter. It specifies whether the constrained interaction is covalent or not. 1 means covalent, 0 means non-covalent. If the constraint is specified as covalent, Rosetta will not evaluate the vdW term between Res1:Atom1 and Res2:Atom1 and their [1,3] neighbors.
For the other 5 parameters, the 4th column specifies the periodicity per of the constraint. For example, if x0 is 120 and per is 360, the constraint function will have a its minimum at 120 degrees. If x0 is 120 and per is 180, the constraint function will have two minima, one at 120 degrees and one at 300 degrees. If x0 is 120 and per is 120, the constraint function will have 3 minima, at 120, 240, and 360 degrees.
The 5th column is optional and specifies how many samples the matcher, if using the classic matching algorithm ( see the matcher documentation ), will place between the x0 and x0 +- tol value. The numbers in this column are not used by enzdes. The matcher interprets the value in this column as the number of sampling points between x0 + xtol and x0 - xtol. I.e. in the above example, for angleA, the matcher will sample values 99.10, 105.10, and 111.10 degress. For torsionA, it will sample 95, 100, 105, 110, and 115 degress. Generally, if the value in this column is n, the matcher will sample 2n+1 points for the respective parameter. Note that the number of samples is also influenced by the periodicity, since the matcher will sample around every x0. For example, for torsion_AB in the block above, there will be 2*5+1 = 11 samples for every minimu, and since there are two minima ( 0 and 180 degrees ), there will be a total of 22 samples for this parameter, at the following values: 0, 9, 18, 27, 36, 45, 135, 144, 153, 162, 171, 180, 189, 198, 207, 216, 225, 315, 324, 333, 342, 351.
When determing how many different values to sample for each parameter, it is important to remember that the number of different ligand placements attempted for every protein rotamer built is equal to the product of the samples for each of the 6 parameters. For example, in the above block there is one sample for distanceAB, 3 samples for angle_A, 3 samples for angle_B, 5 samples for torsion_A, 18 samples for torsion_B, and 22 samples for torsion_AB, meaning that for every protein rotamer, the matcher will attempt to place the ligand in a total of 1*3*3*5*18*22 = 17820 different conformations.
ALGORITHM_INFO:: match SECONDARY_MATCH: DOWNSTREAM ALGORITHM_INFO::END
ALGORITHM_INFO:: match SECONDARY_MATCH: UPSTREAM_CST 2 ALGORITHM_INFO::END
2. Sampling level of the protein rotamers
Generally the protein rotamer sampling level of in the matcher is determined by the values for the -packing:ex1 etc command line options. It is however possible to override these for individual constraint blocks by adding the following lines to a constraint block:
ALGORITHM_INFO:: match CHI_STRATEGY:: CHI 1 EX_FOUR_HALF_STEP_STDDEVS CHI_STRATEGY:: CHI 2 EX_ONE_STDDEV IGNORE_UPSTREAM_PROTON_CHI ALGORITHM_INFO::END
3. Modifying match positions according to structural features.
It is possible to modify the scaffold positions that are considered for an upstream residue through instructions in the cstfile block of this residue. This can come in very handy when matching against a large number of scaffolds.
The properties that can be used to discriminate against certain match positions at the moment are secondary structure, bfactors, and number of 10A neighbors.
examples: adding the following to a block to a cstfile
ALGORITHM_INFO:: match_positions bfactor absolute <value> ALGORITHM_INFO::END
will mean that only those positions in the posfile where the calpha has a bfactor of less than value will be matched, for the constraint in which's block in the cstfile this information is.
The following other options are possible at the moment:
only positions with a relative bfactor of
are allowed (value has to be 0 < value < 1 ) , where relative means ca-bfactor divided by the biggest ca-bfactor observed in the pdb
ss ss_char H only positions that are in a helix are allowd (to get sheet/loop, replace H by E/L )
ss ss_motif helix_nterm only positions at the n terminus of a helix will be matched
num_neighbors min_neighbors <minval> max_neighbors <maxval> only positions that have between minval and maxval neighbors will be matched
num_neighbors max_neighbors <val> only positions that have less than val neighbors will be matched
num_neighbors min_neighbors <val> should be obvious
all these options can be combined, i.e. if you add the following to a block in your cstfile
ALGORITHM_INFO:: match_positions ss ss_motif helix_nterm bfactor relative 0.4 num_neighbors min_neighbors 7 max_neighbors 20 ALGORITHM_INFO::END
Only positions in the posfile that are at the n terminus of a helix, are relatively rigid, are not too exposed but also not too buried will be matched. The code for these exclusions resides in