You are here

how to make benchmark?

24 posts / 0 new
Last post
how to make benchmark?
#1

Dear:

I've generated 20,000 structures by ab inito methods in Rosetta. I am wondering how to make benchmark by plot SCORE vs RMSD? I would like to take the lowest energy model as the reference model.

thank you very much

Post Situation: 
Wed, 2012-04-04 05:23
albumns

The scorefile output of most Rosetta runs is a tabular, white-space separated format that can be read into your favorite plotting program. If a protocol doesn't give you a separate scorefile, but instead gives you just a silent file, you can get the equivalent scorefile by greping out the SCORE: lines of the silent file. You just then need to use your favorite plotting program to generate your plots.

For example, I use R, so I'd probably be do something like:

bash$ grep "SCORE:" silentfile.out > scorefile.sc
bash$ R
R$ t = read.table("scorefile.sc",header=T)
R$ summary(t) # Will show if the table was read in correctly, and what columns are availible, as it varies a bit with protocol
R$ plot(t$rmsd, t$total_score) # Change names based on columns you want to plot

You don't need to use R, of course, any plotting program that reads whitespace separated tabular data will work.

Wed, 2012-04-04 09:54
rmoretti

I just realized that you probably don't have the rmsd value you want in the current scorefile.

To get that, you can use the scoring applications ( http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d2/d... ). If you pass the lowest energy structure (as a PDB) to the command line flag -in:file:native, your output scorefile should contain a column representing the C-alpha rmsd to the "native" (in your case the lowest energy structure).

Wed, 2012-04-04 09:59
rmoretti

Does that trigger superposition or just RMSD calculation? (I'm thinking the latter; I've always written my own RMSD apps as needed...)

Wed, 2012-04-04 10:02
smlewis

Right, by default it looks like it does a raw rmsd calculation using current coordinates. However, there is an (apparently undocumented) -score_app:superimpose_to_native command line option which should trigger the superposition.

That's for the regular/old scoring application. The rmsd calculation for score_jd2 is triggered in some deep internals, and I'm not quite sure how one controls that.

Wed, 2012-04-04 10:53
rmoretti

Wheeeeeee! Undocumentation!

Wed, 2012-04-04 13:57
smlewis

Hi guys. Thanks a lot for kind comments.

Could anyone provide more comments on how to "-score_app:superimpose_to_native" and "score_jd2"? I don't find any documentation for both of them.

thank you very much

Thu, 2012-04-05 00:33
albumns

As far as I can tell, neither exist.

The latter option appears to work only with the older score app, not score_jd2.

Thu, 2012-04-05 05:43
smlewis

A Steven says, there isn't currently any documentation for either of them.

score_jd2 is intended as a "jd2" (the unified scheme for file input, output, and job control which we're trying to transition Rosetta applications to) version of the older "score" application. They are unfortunately not flag/flag or even feature/feature compatible. The documentation for the scoring application was written for the older, non-jd2 score.

-score_app:superimpose_to_native is a command line flag for the older score application which causes it to do superposition of structure with the "native" prior to computing the rmsd. I unfortunately don't know if there is a score_jd2 equivalent for this.

Thu, 2012-04-05 09:09
rmoretti
Category: Structure prediction Dears, I want to model 1000 structure by Rosetta , then calculate plot by energy vs rmsd for each structure to find for low energy would you please guide me to do it . my commands: -loops:input_pdb Model_4gbr.pdb -loops:loop_file 4gbr.loop_file -loops:remodel perturb_kic -loops:refine refine_kic -in:file:fullatom -out:prefix myloop -loops:extended true -nstruct 1000 -ex1 -ex2 thanks in advance,
Tue, 2013-02-19 03:24
ramin

Which RMSD do you want? Whole-structure or loop RMSD? Did you run Rosetta already? Is there an RMSD column in the output you already have?

If you haven't run loop modeling yet, try adding -in:file:native to your command line, it will then generate RMSDs against that native. (You can re-use the -s argument as the -native argument). It generates backbone heavyatom loop RMSDs if you are using KIC.

Assuming you have run your 1000 structures and do not have RMSDs handy, the simplest way to generate RMSDs is to re-score the 1000 PDBs via score_jd2:

score_jd2.linuxgccrelease -l pdblist -native native.pdb -out:file:score_only scorefile.sc -database your_database

where "pdblist" was generated via "ls *pdb > pdblist" (in other words, an endline-delimited list of all PDBs to run on).

This will probably give you an output scorefile that includes an RMSD as one column and total_score as the first or second column, which you can use as input to your plotting software of choice.

If that doesn't work, let me know, and I'll fiddle with flags some more until I find something that does.

If you need loop RMSDs, the easiest way to get it is probably going to be to write a script to extract only the loop residues, and use the older score executable with -native and score_app:superimpose_to_native as Rocco suggests above.

Tue, 2013-02-19 09:12
smlewis

Not whole, only Ca RMSD of loop,
But when I employ -in:file:native I face with this error
****core.pose.util: Cannot open psipred_ss2 file tt
protocols.loops.loops_main: can not open DSSP file tt

ERROR: !pdb.empty()****
No I do not have any RMSD column in my output.

thanks,

Wed, 2013-02-20 10:05
ramin

Can you give me your complete command line here? None of the other stuff you've mentioned involves psipred files.

Wed, 2013-02-20 10:20
smlewis

Getting the RMSD specifically of the loop might be a little tricky, as you would have to specify the residues over which you wish to calculate the RMSD. If you can get the loop remodeling protocol to calculate it for you as it makes the structures, that would probably be ideal (as it knows which residues it needs to operate over). Most RMSD calculations in Rosetta would be over the entire structure (because they don't have facilities to input the loop subset designations).

Trying to calculate it after the fact may be tricky, but I think RosettaScripts (https://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Ros...) can permit you to do it. In RosettaScripts there's an Rmsd filter (https://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Fil...) which allows you to specify a subset of residues over which to calculate the RMSD. You can set a script that defines that filter, and then add the filter to the PROTOCOLS section. If you then run RosettaScripts with that script, the structures with which to calculate the RMSD as the input, and the reference structure as -in:file:native, the output scorefile should contain a column which represents the RMSD value.

One complication would be superposition. I believe the filter superimposes on the specified residues, so you probably want to set superimpose=0 and make sure the input poses and the reference structures are pre-aligned how you want them to be.

Wed, 2013-02-20 10:47
rmoretti

It is also possible to grep out the loop residues and run regular RMSD calculations on the truncated PDBs.

I agree with you that it's not appropriate to superimpose loops for RMSD calculations - their endpoints are already superimposed in the Rosetta output, and it doesn't make sense to transform the isolated loop residues without the rigid core protein context.

Wed, 2013-02-20 10:52
smlewis

this is my input, but I did not get the RMSD as one column and total_score as the first or second column

-loops:input_pdb Model_3pbl.pdb
-loops:loop_file 3pbl.loop_file
-loops:remodel perturb_kic
-loops:refine refine_kic
-in:file:fullatom
-in:file:native Model_3pbl.pdb
-out:prefix myloop
-loops:extended true
-nstruct 1
-ex1
-ex2
-out:file:scorefile
-overwrite
> 1.log
~

Wed, 2013-02-20 12:16
ramin

What does your scorefile look like? Post the first few lines (header and contents).

Also, you are using -out:file:scorefile without passing a scorefile name - this may be causing Rosetta to write to a nameless file (essentially, not writing a scorefile at all). Give an argument to that flag, like "-out:file:scorefile score.sc". score.sc should not exist beforehand.

Wed, 2013-02-20 14:20
smlewis

Yes I have a scorefile name but it does not give me scorefile ,I use this command to run ***loopmodel.default.linuxgccrelease @flag***

-loops:input_pdb rec2_3pwh.pdb
-loops:loop_file 3pwh.loop_file
-loops:remodel perturb_kic
-loops:refine refine_kic
-in:file:fullatom
-in:file:native rec2_3pwh.pdb
-out:prefix myloop
-loops:extended true
-nstruct 1
-ex1
-ex2
-out:file:scorefile output.sc
> 1.log

Thu, 2013-02-21 01:54
ramin

What output does it give you? Do you get your output PDB? Does the end of the log output (1.log) suggest that Rosetta completed successfully? I'm having trouble duplicating the problem.

Thu, 2013-02-21 07:28
smlewis

Hi,

I have 10000 PDB file that I remodeled their loops,
But Now I want to make a graph of :
x-axis: RMSD based on backbone structure
y-axis: Score
To select low-RMSD, low-energy structures for further analysis
How can I make it.

thanks

Sat, 2013-03-09 06:58
ramin

Assuming you have the RMSDs and scores in a simple file format, either A) the Rosetta scorefile, B) grepped from the PDBs, or C) grepped from the log files, then just take your pick of plotting software: excel, OpenOffice calc, gnuplot, R.... It's just a scatterplot of X versus Y.

Sat, 2013-03-09 09:26
smlewis

I have a question,
When I add Hydrogen atoms to my protein I face with a lot warning like **discarding 2 atoms at position 243 in file Best match rsd_type: ALA ** ...
when I remodel my system (loops) should I add Hydrogen atoms to my system or not.

thanks

Sun, 2013-03-10 09:31
ramin

Rosetta is...idiosyncratic when it comes to hydrogen atoms. Short version: do not worry about adding hydrogens, and you never need to add hydrogens to get Rosetta to work well.

Long version:
Rosetta prefers to add hydrogens itself when it loads in a structure. It will leave existing hydrogens in place if it successfully reads them in, which is dependent on using Rosetta's preferred hydrogen naming scheme (which may not match the current PDB scheme, although it matched at some point). The discarding atom messages are due to Rosetta ignoring the hydrogens whose names it does not like. If you want really specific hydrogen placements, I can work with you on getting the naming consistent with what Rosetta expects...but unless you have a really good reason to like your hydrogen placements, just leave them out and let Rosetta autoplace them and don't worry about it.

Sun, 2013-03-10 13:58
smlewis

Rule of thumb - the names that Rosetta uses in its output PDBs are the names it expects in input PDBs.

That said, most protocols will likely change the hydrogen (and some heavy atom) placements anyway, so I agree with Steven that in most cases worrying about hydrogens isn't necessary.

Sun, 2013-03-10 14:18
rmoretti