Rosetta 3.4
Docking Protocol (RosettaDock)
Author:
Brian Weitzner (brian.weitzner@jhu.edu), Monica Berrondo (mberron1@jhu.edu), Krishna Kilambi (kkpraneeth@jhu.edu), Robin Thottungal (raugust1@jhu.edu), Sidhartha Chaudhury (sidc@jhu.edu), Chu Wang (chuwang@gmail.com), Jeffrey Gray (jgray@jhu.edu)

Metadata

Last edited 7/18/2011. Corresponding PI Jeffrey Gray (jgray@jhu.edu).

Code and Demos

To run docking, type the following in a commandline:

[path to executable]/docking_protocol.[platform|linux/mac][compile|gcc/ixx]release –database [path to database] @options

Note: these demos will only generate one decoy. To generate a large number of decoys you will need to add –nstruct N (where N is the number of decoys to build) to the list of flags.

References

We recommend the following articles for further studies of RosettaDock methodology and applications:

Purpose

Determine the structure of protein-protein complexes by using rigid body perturbations of the protein chains.

Algorithm

The following description has been adapted from Chaudhury, et al., 2011, PLoS One:

RosettaDock is a Monte Carlo (MC) based multi-scale docking algorithm that incorporates both a low-resolution, centroid-mode, coarse-grain stage and a high-resolution, all-atom refinement stage that optimizes both rigid-body orientation and side-chain conformation. The algorithm roughly follows the biophysical theory of an encounter complex formation followed by a transition to a bound state. Typically the algorithm starts from either a random initial orientation of the two partners (global docking), or an initial orientation that is randomly perturbed from a user-defined starting pose (local perturbation). From there, the partner proteins are represented coarsely, where side chains are replaced by a single unified pseudo-atom, or centroid. During this phase, a 500-step Monte Carlo search is made with adaptive rotation and translational steps adjusted dynamically to achieve an acceptance rate of 25%. The ScoreFunction used in this stage primarily consists of a ‘bump’ term, a contact term, and docking-specific statistical residue environment and residue-residue pair-wise potentials.

Once the centroid-mode stage is complete, the lowest energy structure accessed during that stage is selected for high-resolution refinement. During high-resolution refinement, centroid pseudo-atoms are replaced with the side-chain atoms at their initial unbound conformations. Then 50 Monte Carlo+Minimization (MCM) steps are made in which:

  1. The rigid-body position is perturbed by a random direction and magnitude specified by a Gaussian distribution around 0.1 Å and 3.0˚
  2. The rigid-body orientation is energy-minimized
  3. The side-chain conformations are optimized with RotamerTrials, followed by a test of the Metropolis criteria.

Every eight steps, an additional combinatorial side-chain optimization is carried out using the full side-chain packing algorithm, followed by an additional Metropolis criteria check. To reduce the time devoted to the computationally expensive energy-minimization for unproductive rigid-body moves, minimization is skipped if a rigid-body move results in a change in score of greater than +15. The all-atom score function used in this stage primarily consists of Van der Waals attractive and repulsive terms, a solvation term, an explicit hydrogen bonding term, a statistical residue-residue pair-wise interaction term, an internal side-chain conformational energy term, and an electrostatic term.

For particular targets, a variety of RosettaDock sampling strategies are often used to improve the chance of achieving an accurate structure prediction. If no prior structural or biochemical information is known about the protein interaction of interest, global docking is used to randomize the initial docking poses. From there, score filters and clustering are used to identify clusters of acceptable low-energy structures for further docking and refinement. In most cases, there is some known information about the complex, either in the form of related protein complexes or in biochemical or bioinformatics data which identify probable regions of interaction on the protein partners. In these cases users manually arrange the starting docking pose to a configuration that is compatible with the information and carry out a local docking perturbation. Additionally, users can set distance-based filters that bias sampling towards those docking poses that are compatible with specified constraints.

Modes

Input Files

The only required input file is a prepacked pdb file containing two proteins with different chain IDs. Starting structures must be prepacked because the side chains are only packed at the interface during docking. Running docking prepack protocol ensures that the side chains outside of the docking interface have a low energy conformation which provides a better reference state for scoring. For more information on prepacking, see the Docking Prepack protocol documentation.

Note: The following flags should be given to every docking simulation: -ex1 -ex2aro.
If you are using a starting structure with more than two polypeptide chains, you should include the -partners flag. If this flag is omitted, docking will dock the first two polypeptide chains in the strucutre.

Options

Flag Description Type

Basic protocol options

-partners [P1_P2] Defines docking partners by chain ID for multichain docking. For example, "-partners LH_A" moves chain A around the dimer of chains L and H. String

Starting Perturbation Flags

-randomize1 Randomize the orientation of the first docking partner. (Only works with 2 partner docking). Boolean
-randomize2 Randomize the orientation of the second docking partner. (Only works with 2 partner docking). Boolean
-spin Spin a second docking partner around axes from center of mass of the first partner to the second partner. Boolean
-dock_pert [T] [R] To create a starting strucutre from the input structure, randomly perturb the input structure using a gaussian for translation and rotation with standard deviations [T] and [R]. Recommended usage is "-dock_pert 3 8" RealVector
-uniform_trans [R]
Uniform random repositioning of the second partner about the first partner within a sphere of the given radius, [R]. Real

Packing Flags

-norepack1 Do not repack the sidechains on the first docking partner. (Only works with 2 partner docking). Boolean
-norepack2 Do not repack the sidechains on the second docking partner. Boolean
-sc_min Perform extra side chain minimization steps during packing steps. Boolean

Full Protocol Flags

Default mode of docking. No additional flags necessary.

Low Resolution Docking Only Flags

-low_res_protocol_only Only run the low resolution part of the protocol (skips all high resolution steps and only outputs low resolution structure). Boolean

High Resolution Docking Only Flags

-docking_local_refine Refine the docking position in high resolution only (skips all low resolution steps of the protocol). Uses small perturbations of the positions, no large moves. Boolean

Local High Resolution Minimization Flags

-docking_local_refine Refine the docking position in high resolution only (skips all low resolution steps of the protocol). Uses small perturbations of the positions, no large moves. Boolean
-dock_min Does a single round of minimization in high resolution, skipping the mcm protocol. Boolean

Relevant common Rosetta Flags

-s [S] OR -silent [S] Specify the file name of the starting structure, [S] (in:file:s for PDB format, in:file:silent for silent file format). String
-native [S]
Specify the file name of the native structure, [S], for which to compare in RMSD calculations. If a native file is not passed in, all calculations are done using the starting structure as native. String
-nstruct [I] Specify the number of decoys, [I], to generate. Integer
-database [P] The path to the Rosetta database (e.g. ~/rosetta_database). String
-use_input_sc Use accepted rotamers from the input structure between Monte Carlo+Minimization (MCM) cycles. Unlike the -unboundrot flag from Rosetta++, not all rotamers from the input structure are added each time to the rotamer library, but only those accepted at the end of each round the remaining conformations are lost. Boolean
-ex1/-ex1aro -ex2/-ex2aro -ex3 -ex4 Adding extra side-chain rotamers (highly recommended). The -ex1 and -ex2aro flags were used in our own tests, and therefore are recommended as default values. Boolean/Integer
-constraints:cst_file [S] Specify the name of the constraint file, [S]. String

Expert Flags

-dock_mcm_trans_magnitude [T] The magnitude of the translational perturbation during MCM steps in docking in Angstroms.
Defaults to 0.7 Å
Real
-dock_mcm_rot_magnitude [R]
The magnitude of the rotational perturbation during MCM steps in docking in degress.
Defaults to 5.0º
Real
-docking_centroid_outer_cycles [C] Number of cycles to use in outer loop of docking low resolution protocol.
Defaults to 10.
Integer
-docking_centroid_inner_cycles [C] Number of cycles to use in inner loop of docking low resolution protocol.
Defaults to 50.
Integer

Tips

This will add a FLAT_HARMONIC potential with the parameters 0 1 5 (recommended; see this page for more on constraint files) around the distance between the CA of residue 4 (PDB numbering) on chain A and the closest CA on chain D to the ScoreFunction.

Expected Outputs

1 PDB file for each candidate docked model generated and a 1 scorefile for each run summarizing all generated models.

Post Processing

Sort scorefile by score using commandline sort function. For global docking simualations, one should generate at least 10,000 decoys and ideally 100,000 decoys should be produced. Sort by score (pay attention to I_sc and total!) and cluster the top 200 decoys by pairwise RMSD. Since global docking in Rosetta 3 has not been thoroughly tested, we do not have scripts available to automate this process. We recommend using the scripts as mentioned in our Rosetta++ tutorial. Some scripts may require some modification.

New things since last release

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines