In a two domain protein, I randomize the phi and psi angles of the interdomain linker and then energy minimize, considering only repulsive VDW. Oddly, about one third of the time, the minimization step increases the energy. Possibly the minimization function is considering only the residues that are allowed to move, but that that score function is considering the entire system. If this is what is going on, it is not what I expect or want, I'd like the minimization function to consider the entire system. Below is the python script. The input pdb can be obtained from http://gene.bio.jhu.edu/input.pdb. Residues 161-168 are the linker region of the protein.
from rosetta import *
full_pose = pose_from_pdb("input.pdb")
scorefxnvdw = ScoreFunction()
scorefxnvdw.set_weight(fa_rep, 1.0) #VDW
#set movemap to adjust only linker region in full protein pose
mm = MoveMap()
#set minimization mover to shift domains
minmover = MinMover(mm, scorefxnvdw, 'dfpmin', 10, True)
for i in range (161, 168):
full_pose.set_phi(i, random.uniform(-180, 180))
full_pose.set_psi(i, random.uniform(-180, 180))
print "Energy before minimization", scorefxnvdw(full_pose)
print "Energy after minimization", scorefxnvdw(full_pose)
Hm. That is definitely odd behavior from the minimizer. I wonder if it is because you are in a very bad region of the energy function. That is, if the random phi/psi selection creates badly overlapping clashes where your two domains largely overlap (which you would see by a very high fa_rep score, as well as in the structure), then it may be hard for the minimizer to resolve, due to the very high energy and gradients. Because the minimizer is sampling discrete points in the domain (in your case phi-psi of residues 161-168), it may get confused because the energy function is not smooth enough.
If this is the case, a workaround might be to reject configurations before minimization if they are above a certain fa_rep threshold which might indicate unreasonable levels of clashes.
This is indeed a nasty, unexpected behavior in the minimizer. I like Jeff's guess that you are in such an awful region of conformational space to start with that everything is bad, but it's still surprising. What do before and after actually look like? If you start with protein eclipsed into protein, then you're going to end in the same wrong state.
If you want to solve this domain assembly problem generally, the FloppyTail executeable can take a stab at it. Domain assembly (er, well, envelope prediction) will be fully supported in 3.3 and it sorta-works in 3.2.
With respect to the first reply: I have observed minimization failures from structures in which the "before energy" was as low as 1408. (Less than 1/2000 of the randomized structures possess energies less than 2000).
With respect to the second reply: the before, and after structures from failed minimizations usually (it could be always) have one domain interpenetrating the other. (I had originally hoped to use minimization to eliminate the interpenetration and leave the domains in contact. Now I realize that I'd need to restrict attention to those randomized structures in which any domain interpenetration distance is less than an atomic radius, which I'd hope would merely be those structures in which the energy is less than some value. The minimization failure mentioned in the previous paragraph suggests that this may not work however.
If the minimization function becomes confused, shouldn't it at least return the same energy and structure as it was first given?
"Now I realize that I'd need to restrict attention to those randomized structures in which any domain interpenetration distance is less than an atomic radius, which I'd hope would merely be those structures in which the energy is less than some value."
This is likely to be true. It sounds like you really need a proper sampling protocol for the linker, not just minimization.
I agree with you that the minimizer should be returning the input structure when it gets confused in this case. I don't know why it isn't. With severe clashes, perhaps there is enough rounding noise moving between processor's longer registers and cache/RAM that the minimizer isn't aware of the error. The minimizer isn't doing a Monte Carlo protocol so under normal circumstances there's no need to check that the output energy is lower than the input. Maybe there's just a bug somewhere...
I'm all that familiar with the minimization internals, but I also would have guessed that the "dfpmin" version shouldn't ever result in increases to energy.
That said, you may want to play around with different minimization algorithms (see http://www.rosettacommons.org/manuals/archive/rosetta3.2.1_user_guide/mi... for an overview), and see how those work in your case. I would particularly suggest trying "dfpmin_armijo_nonmonotone". It should be slightly better at escaping bad regions (although your case may be pathological enough to show problems even there.)
In addition to the other suggestions, you may be able to escape the bad regions by using some sort of Monte Carlo proceedure - instead of just minimizing, do some (small) active perturbations, with or without minimization in between, and hopefully you might wiggle yourself out of whatever strange conformations random angle selection has got you into.
Andrew tells me this is probably due to nblist_autoupdate.
The minimizer knows the neighbor lists - what residues interact with what other residues. Our energy terms have finite "reaches" beyond which we say the energy is zero, to speed computation. These lists of neighbors are cached. Minimization is expected to produce small changes, so these neighbors are not expected to change.
When you start with an awful conformation, the minimizer is encouraged to take big steps. These large steps can take it out of the initial neighbors' range, such that the neighbor list is out of date. The new neighbors are not calculated, so the minimizer is unaware it has gone uphill.
There's a command line flag to fix this, nblist_autoupdate. This causes it to update the neighbor list as it goes. There's probably some way to access this in MinMover...
Above it was suggested that rounding noise might be a cause of the failure. In some minimization failures, the energy increases by more than 100000, which seems to be quite a lot to be a result of a rounding error.
Trying the various minimization algorithms, as suggested above, did not alter the failure behavior.
nblist_autoupdate sounds like the answer, but can someone please suggest how I might access this in the MinMover.
w/r/t rounding error:
Tiny rounding errors in atom placement amplify into huge rounding errors in score, because the clash part of the van der Waals term has distance ^ 12 dependency.
Use this trick to manually activate the option:
The PyRosetta command to set this option is:
core.set_boolean_option( 'run:nblist_autoupdate', True )
w/r/t rounding error, for small separations between atoms, isn't VDW repulsive approximated as -8759.2 d/r + 5672 where d is the atom separation and r is the sum of the atomic radii? If this is correct, rounding errors may not be too important.
w/r/r nblist_autoupdate, on my system, the following commands abort with the message given below
from rosetta import *
core.set_boolean_option('run:nblist_autoupdate', True )
python26: ../src/utility/keys/KeyLookup.hh:309: static const K& utility::keys::KeyLookup::key(const std::string&) [with K = utility::options::OptionKey]: Assertion `i != m_.end()' failed.
I can't find any examples of the constants you mention in the database or code. I'll dig into it further if I get a chance. fa_rep is not calculated on the fly, it's done via precalculated table lookups for speed, so the code is extremely convoluted in the interests of efficiency.
It turn out that 'set_boolean_option' mentioned by Brian is for some reason work on Mac but not on Linux build. I am investigating this and will post more info as soon as I have it.
w/r/t the small r approximation of the repulsive VDW, the form -8759.2 d/r + 5672 is given in Table II of "Protein Structure Prediction Using Rosetta" Carol A. Rohl, Charlie E. M. Strauss, Kira M. S. Misura, and David Baker, Methods in Enzymology Vol 383, 2004. It sounds like fa_rep may have been changed in the meantime.
Allright, the option bug (in Linux version) is fixed in r42506, please get the updated version here: http://graylab.jhu.edu/pyrosetta/downloads/release/PyRosetta-r42506.linu...
Also, location of options function changed since that revision. It now rosetta.basic.options. So the code Brian posted now should look like this: