You are here

RMSD values in docking2 and symmetric_docking runs

5 posts / 0 new
Last post
RMSD values in docking2 and symmetric_docking runs

I am curious how ROSIE docking2 and symmetric_docking runs determine RMSD values.

For docking2 runs, are RMSD values determined between
the output\trigger_00001.dock\native.pdb file
and each output\trigger_00001.dock\proteins_*.pdb file?
If so, how is the native.pdb file made?
Is it a weighted average of all the proteins_*.pdb files
with more weight given to files with better scores or I_sc values?

For symmetric_docking runs, are RMSD values determined between
the output\trigger-00000.make_rms_ref\rms_ref.pdb file
and each output\trigger-00008.extract_pdbs\*protein_*.pdb file?
If so, how is the rms_ref.pdb file made?
Is it a weighted average of all the *protein_*.pdb files
with more weight given to files with better scores or I_sc values?

Where can I find more details about these things?


Post Situation: 
Sun, 2015-08-09 00:48

For symmetric_docking runs, says the following:

"The server also returns a plot of the energies of the 400 lowest energy models. Each point on this plot represents a structure created by the server. The y- axis is the energy of the structure. The x-axis is a distance measure to a reference complex structure (Ca rmsd in Angstrom). The reference complex is the lowest energy model predicted by Rosetta. This is only true if the lowest energy model is found among the models making up the top 5 clusters. If not the reference model is selected to be the cluster center of the largest cluster. The cluster centers for the 5 largest clusters are shown in the plot as well as red points. "

Fri, 2015-08-14 14:51

From what I can tell, it looks like the RMSD in the Docking2 application is calculated to the input structure. So the "native.pdb" should be identical to the input pdb.

Tue, 2015-08-18 15:46

Thanks for your response, rmoretti, but it leaves me puzzled.

Every ROSIE symmetric_docking run I have done so far has RMSD=0.0 clearly labeled on its Score/RMSD plot.
This makes sense because each run chooses one of its models or decoys to be the run's reference for RMSD calculations.

Meanwhile, none of the ROSIE docking2 runs I have done so far have RMSD=0 labeled on their I_sc/RMSD or Score/RMSD plots.
I would think that if the proteins.pdb file used as input to a docking2 run also served as the run's reference for RMSD calculations,
at least one of the run's models or decoys would give RMSD=0. I have even done runs that used the best-scoring or best-I_sc
model/decoy from a previous run as their proteins.pdb file. You would think that at least one of these runs would have given RMSD=0.

Almost all of my docking2 runs so far have used the local_docking protocol, but I have just started doing docking2 runs
with the docking_local_refine protocol. The first of these gave a smallest RMSD value of 0.036. I will let you know if any
of these docking_local_refine runs gives RMSD=0. I think they should have a better chance than local_docking runs of
giving RMSD=0.

In my mind, the question remains, what determines the RMSD values in docking2 runs?

Tue, 2015-08-18 23:38

No, in general you wouldn't expect an output at exactly 0 Ang, as any sort of movement would cause a non-zero rmsd. The only practical way you'd get a zero rmsd is if you have absolutely no movement from the reference structure. In the symmetric_docking case, you get a zero rmsd value because one of the output structures is picked as a reference, so when you compare it to itself, you get a zero rmsd. (Note that all other structures have values which are distinctly not zero). In Docking2, it's not an output structure which is the reference, but the input. Since the reference structure isn't in the output set, you don't get a zero rmsd value in the output.

It makes sense the local refine docking would give smaller rmsds, because it does much smaller perturbations. Smaller perturbations means that the structure stays closer to the starting structure, and the local space is more densely sampled. The more perturbation you use, the further afield you're likely to go.

One thing to keep in mind is that "structural space" is vast, and the Rosetta energy function is rugged. You shouldn't expect to perfectly recapitulate a structure, even if you feed it back into the same protocol. Imagine a marble in a bowl. You jiggle the bowl lightly to "settle" the ball. The marble ends up at the low part of the bowl. But if you repeat the process, you wouldn't expect the marble to come to rest at *exactly* the same spot, even if it started at the bottom of the bowl - there's just too much variation in the process to do so. Same with Rosetta runs. There's always some random perturbation involved, so you're not necessarily going to get exactly the starting structure, even if you use an output structure from a previous run.

You shouldn't be expecting a "perfect minimum" from Rosetta runs - that's just not how Rosetta approaches the modeling problem. Instead, Rosetta aims to cover the search space and give you a number of "good enough" structures. It's important to remember that even high resolution X-ray crystallography structures have some error in atom location. Not to mention the fact that protein structures aren't static, and atoms are always shaking around.

Fri, 2015-09-04 13:57