- Steven Lewis email@example.com
Last edited 5/10/10. Code by Steven Lewis. Corresponding PI Brian Kuhlman firstname.lastname@example.org
The code is at
; there's an integration test+demo at
. Note that the integration test is vastly under-cycled relative to getting it to do anything useful: the number of cycles it demonstrates should be sufficient to show some remodeling but not enough to get anywhere useful. To run that demo, go to that directory and run
[path to executeable] -database [path to database] (at-symbol)options
- Kleiger G, Saha A, Lewis S, Kuhlman B, Deshaies RJ. Rapid E2-E3 assembly and disassembly enable processive ubiquitylation of cullin-RING ubiquitin ligase substrates. Cell. 2009 Nov 25;139(5):957-68. PubMed PMID: 19945379.
You may want to look at the online supplemental info for that paper for a different presentation of how the code works.
This code was written for a relatively singular application. The system in question was a protein with a long (dozens of residues) flexible tail, which was not seen in crystal structures. Biochemical evidence suggested a particular binding site for the tail on a known binding partner (the two binding partners also had a known binding interface separate from this tail). The code was intended to model the reach of the long flexible tail and determine whether the hypothesized binding site was plausible.
The protocol is more useful for testing hypotheses about possible conformations, and exploring accessible conformation space, than for finding "the one true binding mode". If your tail is truly that flexible it might not have a "one true binding mode."
The algorithm is fairly simple: small/shear/fragment moves in centroid mode to collapse the tail into some sort of folded conformation from an initially straight-out-into-space extended conformation, and small/shear moves with repacking to refine its position. This is conceptually similar to how abinitio folding works, although it is not refined for that purpose (and does not contain temperature scheduling, etc).
The code is compatible with constraints during the centroid phase (passed in via commandline). Early modeling proceeded using constraints and some small hacks to help guide models to the hypothesized tail-binding site. Ultimately this was not necessary for the original system, but the code retains the ability to use constraints, etc. Your mileage may vary.
This code can model internally flexible regions between domains, although that is not its intended purpose. It may not do a good job of detecting appropriate interfaces for repacking in this latter case.
This code is NOT intended to do "half-abinitio" where you know half a structure and want to fold the other half. Although it is modeled on abinitio, it is only tested on a truly floppy tail, and I have no idea if it is able to fold compact structures. It is resolutely not supported for that purpose.
See test/integration/tests/FloppyTail/ for example usage. Basically all you need is an input structure.
- The code does not tolerate imperfections in the input PDB. Get rid of your heteroatoms, 0-occupancy regions, multiply-defined atoms, and waters beforehand.
- The code does not add your extension for you. You need to add starting coordinates (however meaningless) for the flexible tail. I had it pointing straight out into space (as it is in the demo).
- See the fragment file input format Fragments Directory for an explanation of how to make fragments.
- See the constraint file input format Constraint File Instructions for an explantion of constraint files.
This code was intended for a single purpose, but it may work if you have a similarly flexible tail. It can also model internal flexible regions between domains.
- You are likely to need a large number of trajectories if you have a long tail. The protocol gets trapped in bad conformations fairly easily even with the Metropolis criterion. Production runs for publication completed slightly less than 30000 trajectories in 1000 processor-days on 2.3 gigahertz processors. I'm sure this can be enormously optimized if you wanted to. (Additionally, an error in the code has been corrected since publication that decreases runtime by 10-25%)
- Constraints are only used in centroid mode in this protocol. Feel free to change that if you want it changed. Some extra constraint reporting is deactivated by comments, you should comment it back in if you want constraints. Constraints are a GREAT way to bias your modeling to test if a hypothesized conformation is possible.
- FloppyTail.cc includes extra_analysis.hh in the same folder. This code is riddled with system-based assumptions and magic numbers (in other words it directly prints data specific to the original system, like the distance between two residues of interest). You'll want to turn off the function create_extra_output at line 392 in FloppyTail, or else replace extra_analysis with your own code. The code ships as used for the final publication.
- This code is compatible with silent-file input, but you have to work around the PDB-numbering based inputs assuming numbering from 1 and chain-lettering from A. (Just use a PDB!)
- When modeling a terminal flexible region (a tail), in the refinement phase, the protocol can be directed to model a shorter portion of the tail for part of refinement mode with the short_tail_xxx options. The reasoning is that the tail may be too close to a binding partner in the structure fresh from centroid mode, even with repacking (centroids tend to be a bit too small for this sort of docking). By only remodeling the tip of the tail in the first part of refinement, you can relax clashes without swinging the tail back out into space.
- The short_tail_xxx, constraint, and pair_off options were not used for production runs with this code. They were used for early experiments, controls, debugging, etc. They still work.
- Fragments are supported, but were not found to be necessary. You should probably be running this on a sequence which has little secondary structure anyway, so the fragments won't be too useful. In the publication case, use of fragments resulted in scattered short helices in the results (clearly raw fragments) without affecting any of the important metrics.
- Do a secondary structure prediction on your flexible tail ahead of time. If it has no SS preferences, use FloppyTail. If it has a few predicted helices, use FloppyTail with fragments. If it has a lot of secondary structure, use abinitio.
FloppyTail supports three types of options: general minirosetta options (packing, etc.), generic protocol options like "how many cycles" borrowed from the (unreleased) AnchoredDesign application, and FloppyTail specific options.
- flexible_start_resnum - integer - this is the start of the flexible tail in PDB numbering.
- flexible_stop_resnum - integer - this is the end of the flexible region, in PDB numbering. Passing 0 or not using this option means the entire chain after flexible_start_resnum.
- flexible_chain - string - the first character of this string is interpreted as the PDB chain for the flexible region; any other characters are ignored.
- shear_on - real - In centroid mode, shear moves are completely nonproductive early on when the tail is still largely extended. This value gives the fraction of centroid cycles when shear moves will be allowed (introduced into the moveset of the RandomMover choosing perturbation moves). For example, passing 0.333 means that for the first third of centroid mode, shear moves will be disallowed.
- short_tail_fraction - real - Fraction of the tail used in the short tail fraction of refinement mode. 0.1 would mean the last tenth of the tail is flexible. Not compatible with non-terminal flexible regions.
- short_tail_off - real - Fraction of refinement cycles dedicated to refining only the short part of the tail. 0.33 means the first third of refinement cycles will be with the shorter flexible region.
- pair_off - boolean - If true, disable the electrostatic Epair (pair and fa_pair) terms. Used for a control experiment, not for general use.
AnchoredDesign options (borrowed for simplicity, not tied to AnchoredDesign in any other way); all are in the AnchoredDesign namespace
- AnchoredDesign::perturb_temp - real - Monte Carlo temperature for perturb phase (0.8 used for production)
- AnchoredDesign::perturb_cycles - unsigned integer - number of perturb phase cycles (5000 used for production)
- AnchoredDesign::perturb_show - boolean - if true, outputs centroid poses after perturbation
- AnchoredDesign::debug - debug - if true, outputs poses for each monte carlo cycle
- AnchoredDesign::refine_temp - real - Monte Carlo temperature for refine phase (0.8 used for production)
- AnchoredDesign::refine_cycles - unsigned integer - number of refine phase cycles (3000 used for production)
- AnchoredDesign::refine_repack_cycles - unsigned integer - Perform a repack/minimize every N cycles of refine mode (30 used for production)
General options: All packing namespace options loaded by the PackerTask are respected. jd2 namespace options are respected. Anything very low-level, like the database paths, is respected.
You'll be using this application to model mostly unstructured regions. You should not put a lot of stock in any individual model. This is not the sort of application where you'll run it 10 times and then take the best-scoring result as an accurate guess for the actual protein structure.
In general you should pick some metric predicted by the model (if you read the paper, you'll see that it was a distance between two residues later found to be chemically crosslinkable). You can then mine the model population to see what this metric looks like in the top-scoring fraction of models. The extra_analysis functionality will facilitate this. I suggest histograms.
There was a major under-the-hood change which decreases runtime, scaling favorably for very long tails. For the publication case it decreases runtime 10-25%.