You are here

Warning when running cluster program

15 posts / 0 new
Last post
Warning when running cluster program

I have a protein of 106 residues (numbered from 1 to 106). I ran relax and cluster the resulting 200 poses.

cluster.linuxgccrelease -in:file:silent mysilentfile

But I got tons of such warning:

core.scoring.rms_util: WARNING: In CA_rmsd, residue range 1 to 111 requested but only 106 protein CA atoms found.

Why does cluster thinks I have 111 residues? Can I still trust the cluster result with all the warnings? Does this have anything to do with the "jump" concept?

Post Situation: 
Fri, 2012-06-15 02:39

Do you have any ligands present?

Fri, 2012-06-15 07:05

No, no ligands.
The warnings disappeared after I set the native structure:

cluster.linuxgccrelease -in:file:silent mysilentfile -in:file:fullatom -in:file:native mynative.pdb

Fri, 2012-06-15 09:06

Does "mynative" have the same number of residues as the poses in the silent file? The error is that you have 111 CAs in one (c-alpha, not calcium), and 106 in the other.

Mon, 2012-06-25 18:25

Yes, both the native decoy and Rosetta output decoys have 106 CA atoms in them. But they are dimers (chain A&B). I have no idea where the number 111 is from. 111-106=5, it is not even an even number (since it is homodimer, I expect the difference to be multiples of 2).

I worked on another protein, it is a trimer. When I do clustering, it generates the same warning 67 instead of 54. I checked the input and output. Both have only 54 residues as well as CA atoms in it.

Tue, 2012-06-26 04:41

Also, it seems the energy calculated by cluster is different from those by score_jd2.
I have docking decoys in silent files. I calculated energy for each decoy using score_jd2. Then I clustered the decoys using cluster. cluster outputs a list of scores like:

protocols.cluster: Adding struc: -103.753

But they are totally different from score_jd2 score values. Not the order of them, but the values. (Rosetta v3.4 is used)

Tue, 2012-06-26 06:36

I can't find an answer for the # CA atoms mismatch.

If the score difference is small (a few units), it's due to imprecision of structure storage on disk, especially if you are using PDBs. A PDB has three decimal places, but the rosetta scorefunction is sensitive to many more decimal places. This means a PDB will never rescore the same as the in-code pose from which the PDB was produced. The problem is ameliorated, if not eliminated, for binary-style silent files.

Tue, 2012-06-26 06:41

Yes, I am aware of this precision problem from previous threads. But the difference is huge:

score_jd2 says:

SCORE: 436.487 -380.814 669.460 182.095 0.763 0.000 0.000 -66.162 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -2.372 2.081 24.410 3.725 3.300 0.000 0.000 -13.400 0.000 mystruc_0091_0001

while cluster says:
protocols.cluster: mystruc_0091 -51.781

Tue, 2012-06-26 06:54

You're right, that's not a precision thing. Could it be a fullatom/centroid thing? Is the previous code emitting centroid or fullatom PDBs?

Tue, 2012-06-26 11:24

Yes, your are totally right. I noticed centroid atoms in the .pdb files after clustering. I specified -in:file:fullatom when doing clustering. This did not help, though.

Wed, 2012-06-27 01:52

Can you give some hint on the concept of "jump residue"?

Tue, 2012-06-26 07:12

This paper

describes the "fold tree" in Rosetta. Basically, Rosetta uses internal coordinates wherever possible, and only converts to XYZ coordinates for some score functions and to print results. Atom positions are calculated by iterating along the network of internal coordinates. The AtomTree and FoldTree specify which atoms are connected to which other atoms, and by which degrees of freedom. This lets Rosetta know things like how to move the end of a lysine when the base chi angles rotate. The tree MUST be a directed acyclic graph that contains all atoms, so that all are accounted for and singly connected. Most connections in these Trees represent chemical bonds. Jumps come in where connections are needed but can't be represented by chemistry. For example, if you have two chains, there is a Jump between the first and second chains representing how the second chain is positioned with respect to the first. Jumps do other things too, especially in loop modeling, but non-chemical internal-coordinate connections between independent molecules in the pose is the common case.

Tue, 2012-06-26 11:28

Yes, your are totally right. I noticed centroid atoms in the .pdb files after clustering. I specified -in:file:fullatom when doing clustering. This did not help, though.

Missing residues complained by cluster program are the virtual residues used in FoldTree. Using -cluster:exclude_res to exclude them will make cluster program stop complaining. (Is it normal that cluster does not recognize the virtual residues generated by Rosetta itself?)

Mon, 2012-07-02 06:41

(Is it normal that cluster does not recognize the virtual residues generated by Rosetta itself?)

Probably - this is exactly the sort of thing developers ignore, because they know what the warning means and know it's irrelevant, so they ignore it. I've never seen it, but I've never used cluster...

Mon, 2012-07-02 07:27

I read the source code and played with my poses using PyRosetta. I realized that the cluster program uses CA_rmsd(), which checks whether the number of residues equals to the number of CA atoms in the protein. If not it spits the WARNING. Virtual residues do not have CA atoms.

It seems CA_rmsd carries on with the calculation of RMSD using all CA atoms it gets. For different poses from the same protein, the WARNING can be safely ignored (unofficial judgement!).

Mon, 2012-07-02 08:25