You are here

clustering PDBs from docking

3 posts / 0 new
Last post
clustering PDBs from docking


I performed a protein-protein docking within rosetta by using a single receptor structure and several conformations of the same ligand protein. Now I am trying to cluster the results (aprox 1000 structures) using the cluster application included in rosetta by running:

cluster.linuxgccrelease -in:file:l pdblist -in::file::fullatom

I found that the result of this is a relatively small number of clusters, but the disparity of RMSDs inside each cluster is quite large: there are structures that differ by up to 50 A in RMSD relative to the best scored structure in the cluster, which is non-sense. I read some posts saying that rosetta does a previous superposition of structures before doing the clustering, so I guess that this is the reason for what I am seeing: once supeposed, all the structures are very similar and all of them exhibit low RMSD values, therefore the clustering application groups all of them in a few clusters.

Therefore, what I need is to perform the clustering without a previous superposition, which I read is possible but cannot find any info in the documentation pages about how to do that. I would very much appreciate if somebody can point where this info can be found or what would be the right commands to achieve that.





Post Situation: 
Wed, 2024-04-24 08:34

I believe the cluster.linuxgccrelease application can take the option `-cluster:skip_align` to tell it to do the non-superposition rmsd. The logic is rather messy, though, and the non-superposition rmsd automatically implies that it's done with all protein backbone atom rmsd, rather than just the Calpha rmsd. It's also incompatible with the -cluster:exclude_res option, so it can only be used when calculating an all-against-all rmsd.

I believe that many people moved to Calibur as a clustering application, rather than the historical Rosetta clustering application. It may or may not have the options you're interested in.

Wed, 2024-04-24 09:00

Thanks for your answer and thanks for the info. I will try this -cluster:skip_align option and see how it goes.

I have also tried Calibur as implemented in Rosetta, but what I get is the following:

> calibur.static.linuxgccrelease -input:pdb_list pdblist
********  (C) Copyright Rosetta Commons Member Institutions.  ***************
* Use of Rosetta for commercial purposes may require purchase of a license. *
********  See or email for more details. **********
core.init: Checking for fconfig files in pwd and ./rosetta/flags  
core.init: Rosetta version: rosetta.binary.linux.release-371 r371 2024.09+release.06b3cf8 06b3cf8ad0940d628690d0ed6fa2009d72ad2b44 2024-03-01T01:30:53.796737
core.init: command: calibur.static.linuxgccrelease -input:pdb_list pdblist
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=759295616 seed_offset=0 real_seed=759295616
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=759295616 RG_type=mt19937
core.init: found database environment variable ROSETTA3_DB: /Programs/rosetta.binary.linux.release-371/main/database
Using C-alphas #1-end
Filtering on
Signature mode off
Using chains 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ' ' in PDB files


and that's all the output, nothing else seems to happen and no activity is apparent when running the top command.

I have tried to use Calibur directly from the rosetta 3.14 linux binaries and also from a version compiled from the source, but the results have always been the same, so I would be really happy if somebody could explain how to make it work.


Wed, 2024-04-24 09:34