You are here

ab initio modelling - what to do next?

3 posts / 0 new
Last post
ab initio modelling - what to do next?

Hi everyone,
Finally i was able to create 10000 models of my 90 residues protein.
By running cluster.linuxgccrelease i got (i guess) 560 clusters:
Cluster: 22s
Additional Clustering: 513s
Total: 560
But now i don't know what to do.
According to the IsThisModelGood.pdf document, now i have to create a plot RMSD vs score, so as i don't have a crystal structure to use as a reference, i think the best idea will be to use the structure with the lowest energy, right?
The problem is that i want to use the score application in order to find it, but then i realised that there are 8 different executables:
score_aln2.linuxgccrelease, score_aln.linuxgccrelease, score_jd2.linuxgccrelease, score.linuxgccrelease,
score_aln2.omp.linuxgccrelease, score_aln.omp.linuxgccrelease, score_jd2.omp.linuxgccrelease and score.omp.linuxgccrelease.
Which one should i used? What is the difference between all of them?
The structure with the lowest energy should belong to the bigger cluster, am i correct? What will happening if this not occur?
I have tons of questions, but i can't find a tutorial out there :/

Hope someone can help me,
Best Regards,
Carlos Navarro

Post Situation: 
Thu, 2014-10-16 13:05

So on the programs: Rosetta uses the trailing parts to convey information about how the programs were compiled, because it's possible and somewhat routine to use differently compiled programs out of the same directory. "linuxgccrelease" means that the program was compiled for Linux using the gcc compiler in release mode. You may want to have multiple compilers because sometimes clusters use different compilers for different system. If you're not in release mode, you're in debug mode, which has addtional error checking, but is *much* slower.

That leaves the difference between score.linuxgccrelease and score.omp.linuxgccrelease. The three part name specifies what "extras" build you're using. "OMP" means you've compiled with OpenMP support. I don't think this actually does much at the moment, (OpenMP isn't actively supported in Rosetta, and is probably best classed as "experimental") but I don't think it would be actually harmful. Other extras builds are things like "mpi" "mysql" and "static", which add in MPI parallelization support (heavily used) , MySQL database support, and static compilation. (Certain extras can actually be combined simultaneously.) If you don't have any extras, the tag is "default".

The two part name ("score.linuxgccrelease") simply refers to the last matching compilation. So in your case the "score.linuxgccrelease" will point to "score.omp.linuxgccrelease". If you now compile a different extras build, it will point to whatever the new compilation settings. (Assuming it's still with gcc on linux in release mode.)

score_aln and score_aln2 score alignments, not structures, so they're not relevant in your case.

Both score and score_jd2 will rescore structures. They have slight differences in options, though. score_jd2 is on the "new" (five year old at this point) job distribution system, and as such follows standard flags ( score is on an older system, and as such has idiosyncratic options ( score has better support for calculating rmsd-type metrics than score_jd2 does, though. Depending on what you're doing either will work.

Generally we assume that the lowest energy structure from the largest cluster is most likely to be the one which is most representative of the native. The thought is that native structures should have wide folding funnels to help combat Levinthal's paradox. That's the one that I would recommend you use as the mock "native" for your score-vs-rmsd plots.

Sometimes it turns out the absolute lowest energy structure isn't from the largest cluster - sometimes it's from an absolutely tiny cluster. Those cases might be indicative of a deficiency in the Rosetta scoring function. For whatever reason, Rosetta can find a really low energy structure, but it's extremely sensitive to small structural changes, such that the slightest change would greatly increase energy. This is probably not indicative of a true native.

You may end up with a situation where you have low energy structures spread between a number of similarly sized clusters. It's relatively easy to recompute rmsds, so you may want to make a number of score-vs-rmsd plots, each with a different reference native. You can then look at the quality of the funnels which result, and pick the reference structure with the best looking score-vs-rmsd plot. (Or you could decide that all of them look pretty bad, and either increase the sampling, or better yet, add in constraints based on experimental information. There really is no substitute for actually looking at the structures with a critical eye and determining if the structure matches what is known experimentally about the system.)

Fri, 2014-10-17 10:13

Thanks a lot for your answer rmoretti :)
now i understand. hopefully i'll see that that the structure with the lowest energy will belong to the largest cluster. and if not, i'll have to check the different clusters to see what is happening.
have a nice day :)

Wed, 2014-10-22 09:31