You are here

Problems with always_constrained_relax_script

4 posts / 0 new
Last post
Problems with always_constrained_relax_script
#1

I am trying to prepare a structure for Rosetta and trying various flags and methods to relax the structure. One way is to use the "relax_w_allatom_cst" protocol as documented here:

https://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d9/...

My inputs are...

Case 1. Normal relax

> /Applications/rosetta3.4/rosetta_source/bin/relax.linuxgccrelease -database /Applications/rosetta3.4/rosetta_database/ -ex1 -ex2 -relax:sequence -nstruct 1 -s mypdb.pdb

Case 2. Relax with Constraints

> python /Applications/rosetta3.4/rosetta_source/src/apps/public/relax_w_allatom_cst/sidechain_cst_3.py mypdb.pdb 0.1 0.5

> /Applications/rosetta3.4/rosetta_source/bin/relax.linuxgccrelease -database /Applications/rosetta3.4/rosetta_database/ -ex1 -ex2 -relax:sequence -constrain_relax_to_start_coords -relax:script /Applications/rosetta3.4/rosetta_source/src/apps/public/relax_w_allatom_cst/always_constrained_relax_script -constraints:cst_fa_file mypdb_sc.cst -nstruct 1 -s mypdb.pdb

I am getting the following warning (case 2), which I do not get when I run the normal relax protocol (case 1);

core.scoring.rms_util: WARNING: In CA_rmsd, residue range 1 to 253 requested but only 252 protein CA atoms found.
core.scoring.rms_util: WARNING: In CA_rmsd, residue range 1 to 253 requested but only 252 protein CA atoms found.

If I re-score the output relaxed pdbs, the scores are the same for case 1, but significantly different for case 2 (i.e. the relax protocol reports one score, and rescoring the output pdb reports a significantly different score).

Some other info: I cleaned 'mypdb.pdb' using the clean_pdb.py script before running relax. Case 1 finds the lower rosetta score but the rmsd to the native structure is ~0.5-0.9, whilst the rosetta score report in case 2 is higher, the structure moves only with a rmsd of ~0.05 - rmsd here were calculated in pymol using 'align'.

Thanks in advance.

W.

Post Situation: 
Wed, 2013-05-29 04:39
wsgosal

Those warnings are nothing to worry about. The coordinate constrained relax protocol adds a "virtual root" residue to allow the protein to match up to the given input coordinates. That 253rd residue that it mentions is the virtual root, which doesn't have a CA atom. It therefore just uses the remaining 252 residues (your protein has 252 amino acids, right?) to do the rmsd calculation, as if the virtual root didn't exist.

Regarding the scoring differences, the scores output from the constrained relax protocol contain the energy from the constraints, which in general will be non-zero. There should be a "coordinate_constraint" column in the scoring. If you subtract this from the total_score from the constrained relax, you should get about the same energy as reported in your rescoring (which won't have the constraint energy term).

Regarding the differences in energy and rmsd seen, that's about what's to be expected from the two procedures. (The paper talking about the constrained relax procedure is at http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0059004)

Wed, 2013-05-29 10:39
rmoretti

Thanks so much for clearing this up for me. Yes, the protein has 252 residues, and as you say, taking the coordinate_constraint from the output score is roughly equal to re-scoring the output pdb.

I just have 3 follow -up questions if I may.

1. I don't think I entirely understand how the 'the energy from the constraints' is determined by the protocol. It is not clear in the above paper.

2. Should it be routine then to rescore these native relaxed pdbs in subsequent plots of RMSD versus rosetta energy when evaluating downstream designs or docking?

3. From the perspective of a new user, is this new relaxing with constraints protocol the way to proceed - should this always now replace the fast relax protocol whatever the downstream application?

Thanks again,

W.

Thu, 2013-05-30 03:26
wsgosal

Each constraint generated has a certain functional form (this can vary based on how you generate the constraints, or if you modify the file). See https://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/de/... for details

For example, the sidechain_cst_3.py script spits out constraints like this:

CoordinateConstraint CB 1 CA 1 99.614 50.652 -6.636 BOUNDED 0 0.1 0.5 0.5 tag

After the definition of the parameters for the coordinate constraint is the functional form imposed. In this case it's the "BOUNDED" function, centered on zero, with a flat (zero energy) region for 0.1 on either side, followed by a linear value of slope (1/0.5)^2. (The last 0.5 controls how to transition from the flat region to the slope, and should always be set to 0.5). The parameter over which this function operates is controlled by the type of constraint, which for a coordinate constraint is the distance of the atom from input coordinates (in Angstroms).

You could also use a different functional form, like:

CoordinateConstraint CB 1 CA 1 99.614 50.652 -6.636 HARMONIC 0 0.5

Which could calculate the energy of the constraint as a harmonic penalty to the deviation from the ideal, centered on zero and with a spring constant of 0.5.

In calculating the constraint energy, all individual constraints on the pose (e.g. the ones that are loaded from the file, and ones that are added later, like those to the backbone atoms) are each evaluated, and the energy calculated from the individual functions are added together to get the total constraint energy.

2) The relax with constraint protocols is intended mainly as a structure preparation protocol, rather than a diversification/sampling protocol. I wouldn't even necessarily recommend it as for post-sampling-in-Rosetta optimization. If you're in a situation where the sampling protocol is producing enough backbone diversity such that an rmsd/energy plot is reasonable, imposing coordinate constraints on the structure doesn't make much sense (as the constraints would be to the arbitrary atom positions put out by the sampling protocol, rather than a "gold standard" from crystallography, etc.). You might want to impose selected constraints if you can impose external information (NMR data, structural knowledge), but in those cases the constraints provide vital structural quality information, and probably would be included in the score-rmsd plot.

3) As an author on the paper, I'd certainly recommend it in preparing structures from the PDB or produced by a non-Rosetta program for the purpose of getting a near-structurally-identical protein which performs well in the Rosetta energy function. I certainly don't see any a priori reason why it would be a bad thing to include.

Keep in mind, though, that some preparation protocols depend on the extra degrees of freedom that the constraints don't allow - they're doing more than just taking *that structure* into the Rosetta energy function. For example, in ligand docking, the structure preparation does an apo structure repack to avoid spurious energy changes that can occur when the limited repacking shell includes a position which can rearrange. Constrained relax by itself wouldn't fix that problem. What I might recommend in that case, though, is to do a constrained relax of the holo structure to fix energy function issues, and then use the standard ligand docking preparation program on the apo to treat the rotamer flipping issue.

Thu, 2013-05-30 10:58
rmoretti