two questions about "Prepare structures for use in Rosetta"

6 replies [Last post]
Anonymous
Category: 
Structure prediction
Constraints
Scoring

I use Rosetta 3.4 on Ubuntu 12.04 64bit.

* Question 1: Positive total_energy after relax with all-heavy-atom constraints?
I have a NMR structure model which I want to "prepare" for rosetta using relax with all-heavy-atom constraints.
Following the "short protocol" given at:
http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/dd/d...
I used the 1st model of the NMR ensemble as input and generated 200 decoys. But all have positive total_energy (~120.00 REU) after relax. Does this mean the model is not very good? (I read in the same page "you should never have positive score12 scores for a well folded protein"). Or relax with all-heavy-atom constraints is a bad idea for my model?

* Question 2: is "-relax:thorough" sufficient?
So I tried to relax my model using different parameters.
Manual on "relax" suggests: "For virtually all situations it should be sufficient to use either -relax:quick or -relax:thorough and not worry about all the options."
http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d6/d...

After relax my structure using -relax:thorough, I got negative total_score values (ca. -200.00 REU) for my model. But it seems only the backbone conformation is optimized. Rotamers are not changed. Can I also relax the side chain as well?

Thanks in advance !

rmoretti
Offline
Joined: 2011-02-28

Relax with all-atom constraints will only work well if the atoms are in the appropriate location to begin with. (The point of the protocol is to avoid moving the atoms all that much.) If there isn't a location near-by which Rosetta likes, you'll still end up with bad scores. For example, if two amino acids are too close by a large amount, the 0.1 Ang movement you typically see with the protocol won't correct that. Increasing the length/type of the protocol (e.g. with -relax:thorough) won't really fix that - the key issue is the constraints are keeping atoms in the wrong location.

The recommended next step is trying to relax the model with just backbone constraints, or with no constraints, to see which residues are moving around. (Another way of approaching that information is to look at the scores in the output constrained relax file - there is likely a few residues which are contributing the most to the large positive score.)

Relax, especially if you don't include any other flags, should be allowing the sidechains to move. If they don't seem to be moving, it might mean that the rotamers themselves are okay, but the backbone is off. Which energy terms are giving you the large positive energy? Is it backbone-specific things like rama and omega? It's been observed that often NMR structures have really bad rama scores. There's been some work for protocols to correct this, for example an extension to the relax protocol called "ramady", though I don't believe it's been officially published/released yet (so the options for it are currently undocumented).

attesor (not verified)

Thanks a lot, rmoretti!
I believe I got positive score because of the all-atom constraints. The positive values come mainly from LJ replusive and solvation energy, and fa_dun. rama is positive but not so big. (see below for example scores).

SCORE: score fa_atr fa_rep fa_sol fa_intra_rep pro_close fa_pair hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih coordinate_constraint rama omega fa_dun p_aa_pp ref time description
SCORE: 126.418 -404.029 116.883 191.826 1.359 0.725 -12.286 -11.597 -34.792 -7.228 -7.912 0.000 0.000 0.000 0.000 84.228 16.015 14.252 215.622 -7.748 -28.900 202.000
SCORE: 126.401 -403.796 116.776 191.715 1.359 0.723 -12.286 -11.304 -34.816 -7.225 -7.921 0.000 0.000 0.000 0.000 83.827 15.906 14.187 215.920 -7.763 -28.900 200.000

I have the RE score of my model lowered to ca. -200.0 after allowing the backbone/side chain to move around.

rmoretti
Offline
Joined: 2011-02-28

While your coordinate_constraint score is high, it's not overly so. (In a recent set of monomer structures I relaxed with the protocol, most of the scores were in the 20-70 range (depending on protein size, of course)). The fa_sol is completely in the expected range (I got 100-400). fa_rep is a little high (I got 20-70), but I think the big issue is the fa_dun term, which is quite a bit larger that what I would expect (80-150). But instead of comparing them with those from other, differently sized structures, you can compare these breakdowns to the scores given by the unconstrained relax of the same protein to see what the main difference are. I would also suggest comparing the per-residue scores (which should be printed at the bottom of the output PDB structures) - it's possible that there may be a particular region that is the main cause of the higher scores.

The reason you might want to dig through this is that it may be useful to know if there is a specific feature of the starting NMR model which Rosetta thinks is far from optimal. You might want to compare that to what's known experimentally about the model - perhaps the difference is specifically in a loop that's not well supported by the data, so you might not need to care that the coordinates there move. On the other hand, if the scoring difference comes from a region where there is strong experimental support, you may not want to let Rosetta move it all that much from the model positions.

Note that you can vary how stringent the constraints are with the -relax:coord_cst_stdev option. The default is 0.5, but by increasing the number you can make the constraints softer, allowing for more movement. By playing around with the value, you may be able to find a level that gets a good score, but still keeps most of the residues close to the starting conformation.

attesor (not verified)

If I relax my structure *without* constraints to side chains thoroughly, i.e. the rotamers are optimized, the fa_dun should not be high. Is that right? Furthermore, fa_dun score for each individual residue should also be low if all rotamers are optimized. Unless by chance some rotamers are not well optimized. But in theory the fa_dun should always be low if no constraints are specified to side chain movements. Is my understanding correct?

The individual fa_dun score for most residues in my protein (106aa) is in the range of [0.0-1.0] after relax (-relax:thorough but with bb cst).

Are there tables of "normal/empirical" scores for each component in Rosetta energy as you mentioned? Listed according to the size of proteins, or course. And for individual residues, in the core, on the surface, etc. Or is it very misleading to make such reference tables?

rmoretti
Offline
Joined: 2011-02-28

If you do an unconstrained relax, Rosetta will attempt to optimize (lower) the energy to the best of its ability. It's often very good at doing so, but the big issue with relaxing experimental models is that it will often blow the structure apart, or at the very least significantly move sidechains and backbones to positions which are not consistent with the experimental data. The constrained relax protocol is an attempt to keep relax moving atoms too far from where the experimentalists say they should be. (Keep in mind that Rosetta score is far from a perfect metric in determining protein structure. Native-like structures tend to have lower Rosetta scores than most arbitrary conformations, but it isn't necessarily the case that the lowest Rosetta energy structure is the most native-like structure.)

Because you're restricting how much relax can change the protein, scores you get with a constrained relax are almost always worse than if you did a constrained relax. But it's often not by a lot, and depending on the use case, not worth the positional differences you get from the experimentally determined structure. The default parameters seem to be the best tradeoff between lowering the energy and keeping the structure close to the starting coordinates - although depending on the case, you may need to adjust things.

Unfortunately, while there are a couple of rules-of-thumb running around, there really isn't a table with expected energies for various residue/score combinations. Part of the complication is that the scores you see depend on how extensively you've optimize the structure. The energy landscape is rather rough, and you can easily "drill down" by doing more and more sampling in a structural neighborhood, without really getting any "better" in structure.

attesor (not verified)

Thank you very much for all the explanation, rmoretti!