I am just trying some tools out for some protein-protein docking analysis. I don't have too many structures since I am just testing, only 500 structs. I choose 148 higest ranking structures to input to energy_based_clustering. It is my assumption from description of simple algorithm described, that all structures would be grouped to a cluster since it indicates "until there are no structures remaining" however I only have 127 pdb files at the end of this. I think this question is robust against any flags in the input file but I can provide if needed.
Also I would expect the lowest total_energy structure from the local_refine score file to be in the first cluster since the algorithm first "Select the lowest-energy structure remaining in the unclustered list as the centre of the current cluster" and this should be the lowest energy structure. However the structure in the c.1.1.pdb file, which I assumed would be the lowest "total_score" of all the structures, only had the 28th lowest total_score. Does this maybe have something to do with difference in energy and total_score. Is this actually choosing the lowest energy? I tested a few energies with Amoeba and saw some resemblance to score and energy. Going to create a graph to check it out.
Curious. The app's default behaviour is to output all structures (though this can be limited with the -cluster:energy_based_clustering:limit_structures_per_cluster flag).
- You've used this flag, perhaps inadvertently, and so are getting a maximum of X structures per cluster. Omit this flag, or set it to 0 (meaning output everything).
- Rosetta is failing to read all of the input structures. Check the output log for errors.
As for the second question, Rosetta re-scores all structures with the Rosetta ref2015 energy function (or another energy function specified with the -score:weights commandline flag). It ignores scores from other apps stored in the input PDBs. Amoeba's energies are probably pretty different from Rosetta's.
Sounds good. I reran with limit_structures_per_cluster 0 and it gave all 148 pdbs. Before it was value of 10. Quick clarification. I did actually run energy_based_clustering with default Rosetta ref2015 energy function and then also reran score_jd2 on on the inputs. I added plot attached of my "quasi pdb centroid score" based off the pdb clusters, i.e. since I have 51 clusters I just added -51 to each cluster prefix so c.1=-50.0, c.2=-49.0, ... c.50=-1.0, c.51=0.0. I also added all of the pdbs that were centroids with a value of 0.5 higher. The second plot is just the centroids, but you can see that the scores from score_jd2 on the right axis don't really coincide with being the lowest score as deemed by the algorithm: pick lowest score first which obviously c.8.1.pdb has a lower Rosetta ref2015 energy score than c.1.1.pdb. If anything the Rosetta ref2015 energy function score should steadily decline or go up (since the y-axis is all negative) at least a little more than they are. The only reason I could think this does not occur is if the geometrical clustering based off of RMSD has absolutely no correlation with the Rosetta energy ref2015 energy function score, where you would expect to see in the larger clusters in the first plot, where the blue bars I gave 0.5 values higher than the cluster centroid for that cluster's list/neighbors, if the RMSD is < 1.0 than you would assume energy function would be relatively stable (blue boxes for clusters 8 and 13) however I am seeing some clusters having more fluctuation (red boxes for clusters 5 and 20, cluster 5 more fluctionation than cluster 20).