- Steven Lewis email@example.com
Last edited 3/6/12. Code by Steven Lewis. Corresponding PI Brian Kuhlman firstname.lastname@example.org.
Code and Demos
The code is at
rosetta/rosetta_source/src/apps/public/scenarios/FloppyTail/; there's an integration test+demo at
rosetta/rosetta_source/test/integration/tests/FloppyTail/. Note that the integration test is vastly under-cycled relative to getting it to do anything useful: the number of cycles it demonstrates should be sufficient to show some remodeling but not enough to get anywhere useful. To run that demo, go to that directory and run
[path to executeable] -database [path to database] (at-symbol)options
- Kleiger G, Saha A, Lewis S, Kuhlman B, Deshaies RJ. Rapid E2-E3 assembly and disassembly enable processive ubiquitylation of cullin-RING ubiquitin ligase substrates. Cell. 2009 Nov 25;139(5):957-68. PubMed PMID: 19945379.
You may want to look at the online supplemental info for that paper for a different presentation of how the code works.
- Crawley SW, Gharaei MS, Ye Q, Yang Y, Raveh B, London N, Schueler-Furman O, Jia Z, Côté GP. Autophosphorylation activates Dictyostelium myosin II heavy chain kinase A by providing a ligand for an allosteric binding site in the alpha-kinase domain. J Biol Chem. 2011 Jan 28;286(4):2607-16. Epub 2010 Nov 11.
This paper uses FloppyTail but is not related to development.
This code was written for a relatively singular application. The system in question was a protein with a long (dozens of residues) flexible tail, which was not seen in crystal structures. Biochemical evidence suggested a particular binding site for the tail on a known binding partner (the two binding partners also had a known binding interface separate from this tail). The code was intended to model the reach of the long flexible tail and determine whether the hypothesized binding site was plausible.
The protocol is more useful for testing hypotheses about possible conformations, and exploring accessible conformation space, than for finding "the one true binding mode". If your tail is truly that flexible it might not have a "one true binding mode."
The algorithm is fairly simple: small/shear/fragment moves in centroid mode to collapse the tail into some sort of folded conformation from an initially straight-out-into-space extended conformation, and small/shear moves with repacking to refine its position. This is conceptually similar to how abinitio folding works, although it is not refined for that purpose (and does not contain temperature scheduling, etc).
The code is compatible with constraints during the centroid phase (passed in via commandline). Early modeling proceeded using constraints and some small hacks to help guide models to the hypothesized tail-binding site. Ultimately this was not necessary for the original system, but the code retains the ability to use constraints, etc. Your mileage may vary. UPDATE: The code is compatible with constraints in both phases.
This code is NOT intended to do "half-abinitio" where you know half a structure and want to fold the other half. Although it is modeled on abinitio, it is only tested on a truly floppy tail, and I have no idea if it is able to fold compact structures. It is resolutely not supported for that purpose.
See test/integration/tests/FloppyTail/ for example usage. Basically all you need is an input structure.
- The code does not tolerate imperfections in the input PDB. Get rid of your heteroatoms, 0-occupancy regions, multiply-defined atoms, and waters beforehand.
- The code does not add your extension for you. You need to add starting coordinates (however meaningless) for the flexible tail. I had it pointing straight out into space (as it is in the demo).
- See the fragment file input format Fragments Directory for an explanation of how to make fragments.
- See the constraint file input format Constraint File Instructions for an explantion of constraint files.
This code was intended for a single purpose, but it may work if you have a similarly flexible tail. It can also model internal flexible regions between domains.
- You are likely to need a large number of trajectories if you have a long tail. The protocol gets trapped in bad conformations fairly easily even with the Metropolis criterion. Production runs for publication completed slightly less than 30000 trajectories in 1000 processor-days on 2.3 gigahertz processors. I'm sure this can be enormously optimized if you wanted to. (Additionally, an error in the code has been corrected since publication that decreases runtime by 10-25%)
- Constraints are a GREAT way to bias your modeling to test if a hypothesized conformation is possible.
- Do not use the -publication flag unless you are doing the demo or repeating the publication. It activates an extra function for system-specific analysis (essentially, postprocessing scripts are embedded inside the executeable). These will crash if run against inputs other than the expected E2/E3/UBQ system.
- This code is compatible with silent-file input, but you have to work around the PDB-numbering based inputs assuming numbering from 1 and chain-lettering from A. (Just use a PDB!)
- When modeling a terminal flexible region (a tail), in the refinement phase, the protocol can be directed to model a shorter portion of the tail for part of refinement mode with the short_tail_xxx options. The reasoning is that the tail may be too close to a binding partner in the structure fresh from centroid mode, even with repacking (centroids tend to be a bit too small for this sort of docking). By only remodeling the tip of the tail in the first part of refinement, you can relax clashes without swinging the tail back out into space.
- The short_tail_xxx, constraint, and pair_off options were not used for production runs with this code. They were used for early experiments, controls, debugging, etc. They still work.
- Fragments are supported, but were not found to be necessary. You should probably be running this on a sequence which has little secondary structure anyway, so the fragments won't be too useful. In the publication case, use of fragments resulted in scattered short helices in the results (clearly raw fragments) without affecting any of the important metrics.
- Do a secondary structure prediction on your flexible tail ahead of time. If it has no SS preferences, use FloppyTail. If it has a few predicted helices, use FloppyTail with fragments. If it has a lot of secondary structure, use abinitio.
- If you have a single chain (one protein), if the flexible region is closer to the N-terminus than the C-terminus, use the C_root option to make computation faster. It reroots the fold tree on the C terminus, obviating a lot of useless recalculation of coordinates for the rigid part of the protein (if the N part is moving).
- You must use the C_root option (or pass a flexible_start_resnum of zero) when using an N-terminal tail. Otherwise, the first residue of the tail will have no motion in phi (undefined at an N-terminus), and the residue will remain fixed in space in a probably undesired fashion.
- If you are modeling a complex, ensure that the flexible region has either the highest or lowest residue numbers possible: if it is a N-terminus, put its subcomponent first in the PDB file, or vice-versa if it is C-terminal. Use the C_root option if your flexible region is a lower residue number, and do not if it is higher. If you have more than two components in your complex, and are modeling a linker between them, then the linker's chain must come between the groups of internally rigid parts in the input PDB file.
- FloppyTail now fully supports modeling internal linker regions (to do domain assembly). It detects interfaces between the internally rigid portions and packs accordingly.
FloppyTail supports three types of options: general rosetta options (packing, etc.), generic protocol options like "how many cycles" borrowed from the (unreleased) AnchoredDesign application, and FloppyTail specific options.
- flexible_start_resnum - integer - this is the start of the flexible tail in PDB numbering.
- flexible_stop_resnum - integer - this is the end of the flexible region, in PDB numbering. Passing 0 or not using this option means the entire chain after flexible_start_resnum.
- flexible_chain - string - the first character of this string is interpreted as the PDB chain for the flexible region; any other characters are ignored.
- shear_on - real - In centroid mode, shear moves are completely nonproductive early on when the tail is still largely extended. This value gives the fraction of centroid cycles when shear moves will be allowed (introduced into the moveset of the RandomMover choosing perturbation moves). For example, passing 0.333 means that for the first third of centroid mode, shear moves will be disallowed.
- short_tail_fraction - real - Fraction of the tail used in the short tail fraction of refinement mode. 0.1 would mean the last tenth of the tail is flexible. Not compatible with non-terminal flexible regions.
- short_tail_off - real - Fraction of refinement cycles dedicated to refining only the short part of the tail. 0.33 means the first third of refinement cycles will be with the shorter flexible region.
- pair_off - boolean - If true, disable the electrostatic Epair (pair and fa_pair) terms. Used for a control experiment, not for general use.
- publication - boolean - If true, output system-specific results used in the demo and publication. Use FALSE for any other purpose; this boolean activates code including hardcoded references to particular residues and will cause either a crash or silly behavior on systems other than the demo/publication.
AnchoredDesign options (borrowed for simplicity, not tied to AnchoredDesign in any other way); all are in the AnchoredDesign namespace
- AnchoredDesign::perturb_temp - real - Monte Carlo temperature for perturb phase (0.8 used for production)
- AnchoredDesign::perturb_cycles - unsigned integer - number of perturb phase cycles (5000 used for production)
- AnchoredDesign::perturb_show - boolean - if true, outputs centroid poses after perturbation
- AnchoredDesign::debug - debug - if true, outputs poses for each monte carlo cycle
- AnchoredDesign::refine_temp - real - Monte Carlo temperature for refine phase (0.8 used for production)
- AnchoredDesign::refine_cycles - unsigned integer - number of refine phase cycles (3000 used for production)
- AnchoredDesign::refine_repack_cycles - unsigned integer - Perform a repack/minimize every N cycles of refine mode (30 used for production)
General options: All packing namespace options loaded by the PackerTask are respected. jd2 namespace options are respected. Anything very low-level, like the database paths, is respected.
- packing::resfile - string - Resfile syntax and conventions - resfile if you want one
- packing::repack_only - boolean - Tells the code not to perform design. Design is performed by default because PackerTasks behave that way.
- in::file::frag3 - string - Fragments Directory - fragments if you've got them
- run::min_type - string - Minimization overview and concepts - minimizer type. dfpmin_armijo_nonmonotone used for production.
- nstruct - integer - number of structures to generate
- constraints::cst_file - string - Constraint File Instructions - for constraints (the centroid phase)
- constraints::cst_weight - real - constraints weight (centroid phase)
- constraints::cst_fa_file - string - Constraint File Instructions - for constraints (the fullatom phase)
- constraints::cst_fa_weight - real - constraints weight (fullatom phase)
Multiple flexible linkers mode
For release 3.4, FloppyTail supports multiple flexible linkers. To use these, you have to write your own MoveMap file to tell FloppyTail what is flexible, and pass it in via the flag in:file:movemap. The formatting is described in the header to the function core::kinematics::MoveMap::init_from_file (probably at core/kinematics/MoveMap.hh). Briefly, do this:
to define flexible regions running from residues 20 to 30 and 54 to 67. This is in internal Rosetta residue numbering (from 1), not PDB numbering.
To use this feature you must also NOT use the following flags, because the movemap handles these data, so the program is set up to ignore inputs from these flags if passing a movemap.
You'll be using this application to model mostly unstructured regions. You should not put a lot of stock in any individual model. This is not the sort of application where you'll run it 10 times and then take the best-scoring result as an accurate guess for the actual protein structure.
In general you should pick some metric predicted by the model (if you read the paper, you'll see that it was a distance between two residues later found to be chemically crosslinkable). You can then mine the model population to see what this metric looks like in the top-scoring fraction of models. The extra_analysis functionality will facilitate this. I suggest histograms.
Changes since last release
For 3.2, there was a major under-the-hood change which decreases runtime, scaling favorably for very long tails. For the publication case it decreases runtime 10-25%.
For 3.3, the publication flag was added for simplicity. The C_root flag was added to speed computation on non-c-terminal tails. Constraints work in fullatom mode. Full support for domain assembly (internal linkers) was added.
For 3.4, I added the ability to specify a custom MoveMap, which also allows for multiple rigid and flexible regions.