I've got an old question for the loop modeling. how can we make benchmark plot for loop RMSD vs energy based on the results? I've asked this question before but it is unsolved before.
thank you very much.
I assume your loop modeling protocol is spitting out loop RMSD numbers and energy values into the scorefile? (If not, it might help to be more specific in which protocols you're running and what values, exactly, are you trying to examine.)
Assuming you have the values in a standard (tabular) Rosetta scorefile, I would recommend the use of R ( http://en.wikipedia.org/wiki/R_%28programming_language%29 ), though any number of plotting programs are available (gnuplot is another popular one - you can even use Excel*, if that's what you're comfortable with.) I recommend learning about R because there's a bunch of analysis you can do with it, in addition to just plotting.
For R, simply load the scorefile:
scorefile <- read.table("path/to/scorefile.sc",header=T)
# Sometimes the first line of the scorefile is SEQUENCE: in which case either delete that line, or add "skip = 1" to the parameters
#You can then display a summary of the file with
#Then plotting is as easy as
plot( scorefile$RMSD, scorefile$ENERGY ) # Replace RMSD and ENERGY with the appropriate name of the columns
#If you wish to save the plot to a file, just surround the call to plot with png/pdf and dev.off() calls. e.g.
plot( scorefile$RMSD, scorefile$ENERGY )
#There's oodles of things you can adjust to change how the plot is displayed. Look at a good R tutorial, or look at the included help (invoked with ?)
*) For Excel, just open the scorefile, then select the column all the text is in. Go to the "Data" menu -> "Text to Columns ..." Choose "Delimited" format, "Next", and then check the boxes next to "Space" and "Treat consecutive delimiters as one", and finally "Finish". It may chug for a while, but afterward you should get a nice columnar spreadsheet, with which you can use the conventional Excel plotting tools.
thank you for kind reply. It doesn't generate scorefile, I only get pdb for output. And here is the protocol I use:
In 3.4, the fastest way to get RMSD's for your pre-created PDBs is to run:
score_jd2.default.linuxgccrelease -database rosetta_database -native native.pdb -s *pdb
This will re-score your PDBs, including an RMSD against the native.pdb, and produce a scorefile score.sc.
Getting just loop RMSDs is trickier. I guess the best way is to slice out the loop residues in your PDBs (with grep or awk or something) to create PDBs containing only the loop, then run that same command line on the just-loop PDBs. (Running RMSD on a mostly-identical pose with the loop varying will give the wrong number, because most of the atoms match perfectly, giving a falsely low RMSD). Also, this only does CA RMSDs.
I've never used it myself, but I think that the Rmsd filter from RosettaScripts ( http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Filt... ) should be able to give loop Rmsd. You would just specify the locations of the loops with the span sub-tags.
If you give it confidence=0, and then add it to the protocols block of your script (you don't need to use a mover), you should get a column in the output scorefile (same name as the name parameter) which lists the corresponding value of the rmsd. If you want any other metrics (per residue scores, etc.) you can add other filters as well. (By the way, if you just want to rescore, you may want to use "-out:file:score_only scorefile.sc" to supress the output of structures.)