Rosetta 3.2 Release Manual

Documentation for the FlexPepDock application

Author:
Barak Raveh, Nir London, Ora Schueler-Furman

Metadata

Last updated April 13, 2010 ; PI: Ora Schueler-Furman oraf@ekmd.huji.ac.il.

Code and Demo

References

The main reference for the FlexPepDock protocol includes additional scientific background, in-depth technical details about the protocol, and large-scale assessment of the protocol performance over a large dataset of peptide-protein complexes:

Raveh B*, London N* and Schueler-Furman O
Sub-angstrom Modeling of Complexes between Flexible Peptides and Globular Proteins
Proteins, 2010 (in press, DOI 0.1002/prot.22716)

Purpose

A wide range of regulatory processes in the cell are mediated by flexible peptides that fold upon binding to globular proteins. The FlexPepDock protocol is designed to create high-resolution models of complexes between flexible peptides and globular proteins, starting from approximate, coarse-grain models. The protocol iteratively optimizes the peptide backbone and its rigid-body orientation relative to the receptor protein, in addition to on-the-fly side-chain optimization, and was benchmarked over a large dataset of peptide-protein interactions.

Algorithm

The input to the protocol is an initial coarse model of the peptide-protein complex (approximate backbone coordinates for peptide in the receptor binding site). Initial side-chain coordinates (such as the crystallographic side-chains of an unbound receptor) can be optionally provided as part of the input model. The first step in our protocol involves the pre-packing of the input structure, to remove internal clashes in the protein monomer and the peptide (see prepack mode below). In the second step (see peptide docking mode below), the protocol optimizes the peptide backbone and its rigid-body orientation relative to the receptor protein, in addition to on-the-fly side-chain optimization. This step is repeated k times, and includes an optional low-resolution (centroid) mode pre-optimization. The output models are ranked by the user based on their energy score. Our protocol is described in detail in the Methods section in Raveh, London et al., Proteins 2010 (see References). Do note that this protocol is not meant for ab-initio peptide docking - an initial starting structure must be provided to the protocol. For more information, see the following tips about correct usage of FlexPepDock.

Modes

Input Files

FlexPepDock requires the following inputs:

Options

Note that the -flexpep_prepack, -rbMCM/torsionsMCM and the -flexPepDockingMinimizeOnly flags denote different modes of functionality, and are therefore mutually exclusive (-rbMCM and -torsionsMCM can be used together, but not with the other flags).

I. Basic FlexPepDock flags:

Flag Description Type Default
-receptor_chain chain-id of receptor protein String first chain in input
-peptide_chain chain-id of peptide protein String second chain in input
-rbMCM Perform rigid body mcm in the main loop of the protocol Boolean false
-torsionsMCM Perform mcm small/sheer moves on peptide backbone in the main loop of the protocol Boolean false
-lowres_preoptimize Perform a preliminary round of centroid mode optimization before going into high-resolution. See more details below. Boolean false
-flexpep_prepack Prepacking optimizes the side-chains of each monomer separately (without any docking). Boolean false
-flexpep_score_only Read in a complex, score it and output interface statistics Boolean false
-flexPepDockingMinimizeOnly Perform only a short minimization on the input complex Boolean false
-ref_startstruct Alternative start structure for scoring statistics, instead of the original start structure (useful as reference for rescoring previous runs with the flexpep_score_only flag.) File N/A
-peptide_anchor Set the peptide anchor residue manually. It is recommended to override the default value only if one strongly suspects the critical region for peptide binding is extremely remote from its center of mass. Integer Residue nearest to the peptide center of mass.

II. Relevant Common Rosetta flags

More information on common Rosetta flags can be found in the relevant rosetta manual pages. In particular, flags related to the job-distributor (jd2), scoring function, constraint files and packing resfiles are identical to those in any other Rosetta protocol).

Flag Description
-in:file:s
Or
-in:file:silent
Specify starting structure (in:file:s for PDB format, in:file:silent for silent file format).
-in:file:silent_struct_type
-out:file:silent_struct_type
Format of silent file to be read in/out. For silent output, use the binary file type since other types may not support ideal form

-native

Specify the native structure for which to compare in RMSD calculations. This is a required flag. When the native is not known use the starting structure as native.
-nstruct Number of decoys to create in the simulation
-unboundrot Add the rotamers of the specified structure to the rotamer library (usually used to include rotamers of unbound monomer)
-use_input_sc pass accepted rotamers from the input structure between Monte-Carlo with Minimization (MCM) cycles. Unlike the -unboundrot flag, not all rotamers from the input structure are added each time to the rotamer library, but only those accepted at the end of each round the remaining conformations are lost.
-ex1/-ex1aro -ex2/-ex2aro -ex3 -ex4 Adding extra side-chain rotamers (highly recommended). The -ex1 and -ex2aro flags were used in our own tests, and therefore are recommended as default values.
-database The Rosetta database

III. Expert flags

	
Flag Description Type Default
-rep_ramp_cycles The number of outer cycles for the protocol. In each cycle, the repulsive energy of Rosetta is gradually rampped up and the attractive energy is rampped down, before inner-cycles of Monte-Carlo with Minimiation (MCM) are applied.

Integer 10

-mcm_cycles Number of inner-cycles for both rigid-body and torsion-angle Monte-Carlo with Minimization (MCM) procedures.

Integer 8
-smove_angle_range Defines the perturbations size of small/sheer moves. Real 6.0
-extend_peptide start the protocol with the peptide in extended conformation (neglect original peptide conformation ; extend from the anchor residue)

Boolean false

Tips

  • A typical run in three steps:
    1. pre-pack your initial complex

       FlexPepDocking.{ext}
      -database ${mini_db} -s input.pdb -native native.pdb -flexpep_prepack
       -ex1 -ex2aro [-unboundrot unbound.pdb]
      

    2. generate 100 (or more) decoys with the -lowres_preoptimize flag, and additional 100 decoys (or more) without this flag, by two separate runs (the low resolution can be skipped if you are in a hurry)

      FlexPepDocking.{ext}
      -database ${minidb} -s start.pdb -native native.pdb
      -out:file:silent decoys.silent -out:file:silent_struct_type binary
      -rbMCM -torsionsMCM -ex1 -ex2aro -use_input_sc
      -nstruct 100 -unboundrot unbound.pdb [ -lowres_preoptimize ]
      

    3. Open the output score file of both runs (score.sc by default), sort it by decoy score (second column), and choose the top-scoring decoys as candidate models.

  • Always pre-pack:
    Unless you know what you are doing, always pre-pack the input structure (using the pre-packing mode), before running the peptide docking protocol. Our docking protocol focuses on the interface between the peptide and the receptor. However, we rank the structures based on their overall energy. Therefore, it is important to create a uniform energetic background in non-interface regions. The main cause for energetic differences between decoys are non-optimal side-chain rotamers in these regions. Therefore, pre-packing the side-chains of each monomer before docking is highly recommended, and may significantly affect the final decoy ranking.
  • Decoy Selection:
    In order to get good results, it is recommended to generate a large number of decoys (at least 200, optimally 2000). The selection of decoys should be made based on their score. While selection of the single top-scoring decoy may suffice in some cases, it is recommended to inspect the top-5 or top-10 scoring decoys. In particular, this set of models allow to identify hot spot and motif residues as those with particularly strong sub-Angstrom structural convergence, compared to more variable side chain conformations at other positions.
  • Low-resolution pre-optimization
    The -lowres_preoptimize flag can be used to add a preemptive centroid-mode optimization step, before performing full atom, high-resolution docking. As a rule of thumb, it is recommended to use this flag when the quality of the initial starting structure is less defined (roughly more than 3A peptide backbone-RMSD), and thus sampling an extended range makes sense. In theory, this flag can be also specified independently (without the -rb_mcm or torsion_mcm flags). In this case, only low-resolution sampling followed by side-chain repacking will be performed. This mode of operation was not tested.
  • The unbound rotamers flag:
    In many cases, the unbound receptor (or peptide) may contain side-chain conformations that are more similar to the final bound structure than those in the rotamer library. In order to save this useful information, it is possible to specify a structure whose side-chain conformations will be appended to the rotamer library during prepacking or docking, and may improve the chances of getting a low-scoring near-native result. This option was originally developed for the RosettaDock protocol.
  • Extra rotamer flags:
    It is highly recommended to use the Rosetta extra rotamer flags that increase the number of rotamers used for prepacking (we used the -ex1 and -ex2aro flags in our own runs, but feel free to experiment with other flags if you think you know what you are doing. Otherwise, stick to -ex1 and -ex2aro).
  • When you should / should not use FlexPepDock
    FlexPepDocking is not intended for fully blind docking. It is intended for obtaining high-resolution peptide models given a coarse-grain starting structure, that should be somewhat close to the native solution (about 5A backbone-RMSD for the native peptide, even though in some cases, the protocol works well for starting structures with up to 12A bb-RMSD from the native). The initial structure can be obtained from homologues, from known experimental or computation information about the correct binding site, etc. In many cases, it may be useful to use a constraint file to force the peptide to reach the vicinity of a known binding site.
    It is also assumed that the secondary structure of the peptide in the initial coarse-grain structure is approximately identical to the native. While the protocol is designed to allow substantial peptide backbone flexibility, it is not designed to switch between secondary structures (from strand to helix conformation, etc.). An initial secondary structure can be assigned based on prior information (homologue structures, etc.), from experimental information (CD experiments, etc.) or from complementary computational predictions (e.g. conformational sampling and ab-initio folding)
  • Typical running time:
    In our tests, producing 200 models typically takes 10 CPU hours (approximately 3 minutes per decoy). Substantial speedup gain is obtained by running parrallel proccesses using appropriate job-distributor flags.

Expected Outputs

The output of a FlexPepDock run is a score file (score.sc by default) and k decoy structures (as specified by the -nstruct flag and the other common Rosetta input and output flags). The score of each decoy is the second column of the score file. Decoy selection should be made based on this column.

Interpretation of FlexPepDock-specific score terms: (for the common Rosetta scoring terms, please also see the relevant manual page).
	
total_score* Total score of the complex
I_bsa Buried surface area of the interface

I_hb Number of hydrogen bonds across the interface
I_pack Packing statistics of the interface
I_sc Interface score (sum over energy contributed by interface residues of both partners)
pep_sc Peptide score (sum over energy contributed by the peptide to the total score; consists of the internal peptide energy and the interface energy)
I_unsat Number of buried unsatisfied HB donors and acceptors at the interface.
rms (ALL/BB/CA) RMSD between output decoy and the native structure, over all peptide (heavy/backbone/C-alpha) atoms

rms (ALL/BB/CA)_if RMSD between output decoy and the native structure, over all peptide interface (heavy/backbone/C-alpha) atoms
startRMS(all/bb/ca) RMSD between start and native structures, over all peptide (heavy/backbone/C-alpha) atoms
*For all interface terms, the interface residues are defined as those whose C-Beta atoms (C-Alpha for Glycines) are up to 8A away from any corresponding atom in the partner protein

Post Processing

Except for decoy selection by total score (see Outputs section), no special post-processing steps are needed. However, advanced users may optionally use Rosetta cluster_commands for for assessing whether top-scoring models converge to a consensus solution. As a general rule, we saw in Raveh et al. that interface side-chains that point towards the receptor, in particular those of hot-spot residues and of known binding motif residues, tend to converge spatially better than side-chains of other residues (see Figure 4 in Raveh et al.). That said, clustering is an optional step, and is not considered an integral part of the FlexPepDock protocol as described and tested in Raveh et al.

Generated on Wed Feb 16 16:04:25 2011 for Rosetta Projects by  doxygen 1.5.9

© Copyright Rosetta Commons Member Institutions. For more information, see http://www.rosettacommons.org.