You are here

RNA Denovo RMSD data

32 posts / 0 new
Last post
RNA Denovo RMSD data
#1

Hi,
I am new to the Rosetta software and I would really appreciate some help from someone with experience in RNA Denovo. I am using the rna_denovo application on a cluster with Rosetta 3.3 and I want to predict 3D structures for a large number of RNA molecules.
Initially I predict 3D structures for which there is NMR data available so that I can determine how well the algorithm works. I am supplying the NMR structure as the native structure in order to plot RMSD vs. energy (score).
I would like to generate similar plots when predicting structures where I am not supplying a native structure. As I understand it from the RNA Denovo Server documentation, RMSD is calculated between the models and the lowest energy model (best scoring model) when no native pdb is supplied, however, I do not obtain any RMSD data in my out file when I run without a native structure. How can I calculate this? Are there any significant differences between the RNA Denovo server application and the one included in the Rosetta package?
Also, are there any post-processing scripts available that can be useful for analyzing data from RNA Denovo?
I hope that someone could clarify these points to me. Thank you.

Best regards,
Emma

Post Situation: 
Thu, 2012-08-02 01:40
eremma

Hi again,
I think that it is probably not a good idea to calculate RMSD to the best scoring model as this structure differs significantly from the native one, at least as far as I have observed in my test runs. Can anyone suggest what I can look for/analyze in order to be able to extract reasonable models? Can the cluster algorithm be helpful in my case?
Thanks you.
Regards,
Emma

Thu, 2012-08-02 07:14
eremma

Clustering is the conventional method for identifying the best structure in these structural prediction cases. The philosophy is that the lowest energy structure which belongs to the largest cluster is probably the one closest to the native structure. (The thought being that lower energy structures which belong to smaller clusters are likely due to deficiencies in the scoring/sampling of Rosetta, rather than being reflective of reality.)

If you're not aware of them already, the Rosetta tutorials at http://www.meilerlab.org/index.php/jobs/resources do a good job of outlining typical post-analysis strategies.

Thu, 2012-08-02 11:20
rmoretti

Thank you for your quick response and suggestions about using clustering.
However, as far I know it is not possible to use the cluster application for RNA in Rosetta 3.1-3.3, without serious modifications. I read in an old post that it will be introduced in the 3.4 release. Do you know if it has been included yet? I don't have access to 3.4 yet but if I know that the cluster application for RNA is in it I will make sure to update.

Thanks,
Emma

Fri, 2012-08-03 05:06
eremma

Thank you for your quick response and suggestions about using clustering.
However, as far I know it is not possible to use the cluster application for RNA in Rosetta 3.1-3.3, without serious modifications. I read in an old post that it will be introduced in the 3.4 release. Do you know if it has been included yet? I don't have access to 3.4 yet but if I know that the cluster application for RNA is in it I will make sure to update.

Thanks,
Emma

Fri, 2012-08-03 05:04
eremma

I don't see any evidence that cluster was significantly modified between 3.3 and 3.4 (it appears not to have been modified at all except for necessary maintenance so that it will compile).

If your poses are from RNA_denovo, then they'll have Rosetta's expected nomenclature. I would guess the only modification you'd need to make is to swap DNA for RNA in the default residue type set (and/or tweak cluster so that it expects RNA poses), and possibly use an option or another code hack to ensure all-atom (instead of c-alpha) clustering.

I've sent a note to someone who does RNA to take a look here if they can.

Fri, 2012-08-03 07:39
smlewis

Skimming through the code, it looks like there is an explicit check for RNA in the cluster application (even in 3.3) so that as long as the first residue is RNA, the clustering application should be set to do an all-atom RMSD clustering. I don't know if there would be additional issues which would cause an RNA structure not to work.

I should also mention that there appears to be an rna_cluster application, as well, in release 3.4, although it doesn't appear to be in release 3.3.

Another option is to skip using Rosetta for clustering and try a different application. Although I don't know how well it handles RNA, a number of people in the Rosetta community have very good things to say about Calibur ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881085/ - http://sourceforge.net/projects/calibur/ ).

Fri, 2012-08-03 11:06
rmoretti

"it looks like there is an explicit check for RNA in the cluster application (even in 3.3) so that as long as the first residue is RNA, the"

Where is this? grep rna cluster.cc has no returns.

Fri, 2012-08-03 12:49
smlewis

It's not in the cluster application per se (i.e. apps/public/cluster.cc), instead it's in the GatherPosesMover::get_distance_measure() function of protocols/cluster/cluster.cc, which, as I read the code, is used by the ClusterPhilStyle subclass to do the structure-structure distance calculation.

Fri, 2012-08-03 18:20
rmoretti

How can I run this cluster.cc and not the default cluster.cc? I would like to try it and see if/how it works.
Thanks!

/Emma

Wed, 2012-08-08 00:33
eremma

protocols/cluster/cluster.cc isn't an application - it's the implementation of the clustering protocols. The clustering application (apps/public/cluster.cc) should use it in it's implementation.

Have you tried running the regular clustering application with your RNA structures? (Assuming that the first residue of your pose is RNA.) If so, what happened? What error did you get, if any, and did you get any output?

Wed, 2012-08-08 11:09
rmoretti

I was running the regular clustering application using the command:
cluster.linuxgccrelease -database /rosetta/3.3/rosetta_database/ -in:file:s *.pdb -in::file::fullatom -out:file:silent out.out

The error is:
ERROR: unrecognized aa rG
ERROR:: Exit from: src/core/io/pdb/file_data.cc line: 655

I do not get any output.

I still have not got access to Rosetta 3.4 so I cannot try the rna_cluster application.

/Emma

Thu, 2012-08-09 06:56
eremma

Try it without in:file:fullatom. The RNA residue type set is not the fullatom residue type set.

Thu, 2012-08-09 07:01
smlewis

The error is still the same.

Thu, 2012-08-09 07:20
eremma

Where are these files coming from? Are they Rosetta outputs? If so, the residues should be recognized if you use the same database and residue type settings (e.g. fullatom/centroid, various other items) that you used in the run which generated them.

Note there are changes that need to be made to the database to get Rosetta to run with RNA (see http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/d6/d...). If, for example, generated the structures on one computer with a properly converted database, but then ran the clustering application on another computer with a different database, you might not be able to properly read in the RNA.

If the structures come from a non-Rosetta source, make sure you have the naming conventions correct. For this error, the gotcha is residue name alignment. I believe Rosetta expects the three letter residue name to be " rG", and will complain if the three letter residue name is "rG ".

Thu, 2012-08-09 10:56
rmoretti

The files are generated by the rna_denovo application.
I did not know that one has to do changes in the rosetta_database to make the clustering application to work with RNA. Thank you for pointing that out. I do not have permission to make the changes myself on the cluster I am running on but I will try to make sure that someone does it and then I will try again. This is most likely the reason why I am getting the errors when running cluster.cc.
I am waiting for Rosetta 3.4 to be installed so that I can also try the rna_cluster application. What will be the difference between using cluster.cc in 3.3 (with the suggested modifications for RNA) and rna_cluster.cc in 3.4?

Fri, 2012-08-10 01:07
eremma

The files are generated by the rna_denovo application.
I did not know that one has to do changes in the rosetta_database to make the clustering application to work with RNA. Thank you for pointing that out. I do not have permission to make the changes myself on the cluster I am running on but I will try to make sure that someone does it and then I will try again. This is most likely the reason why I am getting the errors when running cluster.cc.
I am waiting for Rosetta 3.4 to be installed so that I can also try the rna_cluster application. What will be the difference between using cluster.cc in 3.3 (with the suggested modifications for RNA) and rna_cluster.cc in 3.4?

Fri, 2012-08-10 01:10
eremma

I made the changes to the database according to the instructions, but I still get the same error.
I do not understand if you mean that the pdb file format is incorrect as " rG" perhaps should be replaced by "rG ". I am using the silent output file or the extracted pdb files from rna_denovo as input to the clustering and I get the same error for both.

Fri, 2012-08-10 05:23
eremma

If the structures are coming from a Rosetta run, then in principle you shouldn't have to do anything to read them in to another Rosetta program using the same database. (The suggestion for altering the alignment of the residue name was if you were reading in a PDB from an external source - those may or may not have things aligned the way Rosetta expects them to. But if the outputs came from Rosetta and weren't subject to any modification, they should be okay.) There may be some small residue type set issues (e.g. fullatom vs. centroid), but otherwise it should probably work.

I'm not sure of what all the differences between the cluster and the rna_cluster application are - I've never used it, I just noticed it exists. It looks like it uses a slightly different algorithm. See what documentation we have at http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d2/d...

To be honest, I'm a little baffled why it isn't working, although I'll admit I have no real experience dealing with RNA. Could you copy the full tracer output (what gets printed to stdout/the console) of the cluster run to a file and attach it to a forum post? If possible, add the flag "-out:level 400" to the commandline to get the debug-level output.

Fri, 2012-08-10 14:54
rmoretti

I have run the 3.3 cluster application using:
1. The silent file from the rna_denovo run as input. This gives the console output in the attached output_error_1.txt file.
Command used: cluster.linuxgccrelease -database /c3se/apps/Glenn/rosetta/3.3/rosetta_database/ -in:file:silent test.out -out:file:silent out.out -out:level 400
2. The pdb files extracted from the silent file as input. This gives the console output in the attached output_error_2.txt file.
Command used: cluster.linuxgccrelease -database /c3se/apps/Glenn/rosetta/3.3/rosetta_database/ -in:file:s *.pdb -out:file:silent out.out -out:level 400

Mon, 2012-08-13 07:19
eremma

It's not creating any RNA residue types at all. It's only reading the centroid types. You said you made changes to the database; if you made the changes I think you made (I don't know what changes you made) then you fixed the _fullatom_ types to include RNA. So, try these command lines with -in:file:fullatom.

Mon, 2012-08-13 08:14
smlewis

I found that it was not only -in:file:fullatom that was required for running the clustering with RNA (in 3.3), but also -in:file:silent_struct_type rna.
Full command used: cluster.linuxgccrelease -database rosetta_database -in:file:silent test.out -in:file:silent_struct_type rna -in:file:fullatom -out:file:silent out.out -out:file:silent_struct_type rna
This worked for me and I got a silent output file (out_3.3.txt) and console output (output_3.3.txt). I was running this with the default clustering radius.
My concern now is how to interpret the output data. As far as I can understand the structure model S_000089 is the best scoring structure in the largest cluster and should be the most reliable structure. Is it possible to extract the cluster files c.*.*.pdb etc. somehow?
with
I have also got access to Rosetta 3.4 and I have now tried the rna_cluster application using the following command (with the silent output file from rna_denovo in 3.3):
rna_cluster.linuxgccrelease -database rosetta_database -in:file:silent test.out -out:file:silent out.out
This resulted in the output silent file out_3.4.txt and gave the console trancer output included in output_3.4.txt.
I find, however, this output to be harder to interpret compared to the output from 3.3. I can see that the S_000089 structure is still in the top, but the output does not state the details of the clustering, like in 3.3. Am I missing some additional command so as to get a more clear output?

What clustering radius would you suggest for RNA? The default 2 Å?

Tue, 2012-08-14 01:53
eremma

"As far as I can understand the structure model S_000089 is the best scoring structure in the largest cluster and should be the most reliable structure. "

That is the interpretation of the most cluster-experienced person I could get to look at this.

"Is it possible to extract the cluster files c.*.*.pdb etc. somehow?"

Yes - score_jd2 -in:file:silent ???.out -in:file:silent_struct_type rna -in:file:tags "put a list of which pdbs you want here, based on their tag in the silent file". score or extract_pdbs will probably also work.

"What clustering radius would you suggest for RNA? The default 2 Å?"

This is empirical. Change it if you don't like the performance. The code's author probably set the default to a good value.

"Am I missing some additional command so as to get a more clear output?"

None that I can find. I found a boolean option "auto_tune" but no documentation on what it does.

Tue, 2012-08-14 08:22
smlewis

I am trying to understand how to relate a rna_denovo run on the server to a rna_denovo application run on our Linux cluster. I have been running a few test job on the server, but I am not sure how to interpret the results.
First of all, I cannot understand the definition of "Cluster center models". Are they the best scoring models in each cluster? It seems like the cluster center models C-01-C-20 are identical to the top-20 lowest energy structures M-1-M-20. That makes no sense to me. How is it possible to know which cluster is in that case the largest one which should contain the "best structure"?
Which clustering method is used on the server; the 3.3 (cluster) or 3.4 (rna_cluster) one?

Thu, 2012-09-06 03:56
eremma

http://rosettaserver.graylab.jhu.edu/documentation/rna_denovo doesn't seem to say. Do you have a log file from the run (the first few lines may say)? I asked Sergey (the server's administrator).

The documentation I just linked (under "Interpreting Results") says that it ranks clusters by energy of the best-scoring member of the cluster, and then returns the best-scoring member of the clusters - cluster size is apparently not part of what it's using. (I've never used any of the code in question, so I'm more or less as lost as you). This is consistent with the ranking similarity you saw. It implies a lack of convergence, though - all the top-scoring models assort into different clusters.

Mon, 2012-09-10 07:50
smlewis

It's running off of a two-month-old version of developer trunk, so it is effectively running off of 3.4.

Mon, 2012-09-10 09:26
smlewis

Rhiju confirms that this means lack of convergence (which means probably all models are wrong).

Mon, 2012-09-10 12:04
smlewis

So you are saying that there is no way I can use the results from the denovo runs on the server?
I also have the option of running Rosetta on our cluster, but no matter what cluster radius or clustering method (3.3 cluster or 3.4 rna_cluster) I am using I do not obtain any good results; that is the best scoring model of the largest cluster is not at all a good model (very large RMSD to the native structure). I am thinking that perhaps I am doing something wrong during the process but I cannot figure it out myself. Unfortunately I do not know how to move forward with my study now.
A useful option in the clustering apps would be to have the option of simultaneously extract the best-scoring models of each cluster while running the app, which is now requirred to be done manually from the information in the command line output (if I did not miss any option). Is that something that could be possible in a future release?

Wed, 2012-09-12 00:24
eremma

Would you know of anyone having specific experience in rna_denovo and interpretation of results from those runs that could be willing to assist me? I cannot get much out from the documentation and I do not know how to move on.

Wed, 2012-09-19 04:08
eremma

Another thing; is it possible to run rna_denovo in parallel? I have been trying to do so but all I get is that the application is run separately on each processor.
The command that I am using: mpiexec rna_denovo.mpi.linuxgccrelease -fasta RNA.fasta -native native.pdb -nstruct 10000 -out:file:silent test.out -cycles 10000 -minimize_rna -database rosetta_database

Tue, 2012-08-14 05:49
eremma

Recall that Rosetta is not multithreading, so the only benefits of MPI are organizational (all results in one directory), not speed. If it was having each of N processors all create output_0001, then all create output_0002, then MPI was not working, and since you did it right probably doesn't work. If it was a bunch of independent jobs creating different trajectories, then that's how it's supposed to work.

So far as I can tell, rna_denovo in 3.3 (actually, everything in the rna module) does not run in MPI - at least I can't find any hooks in the code. If you want to run on lots of processors, the easiest thing to do is to set up a script like (this is pseudocode):

for i in range(1-N):
mkdir i
cd i
make symlinks to inputs
rosetta @options -constant_seed -jran fixed_random_number+i

Notice the use of a rolling argument to jran to ensure different trajectories on each processor.

If you *need* MPI due to sysadmin-imposed constraints, there is an unmaintained "mpilistwrapper" which lets you run a bunch of Rosetta jobs under a veneer of MPI, using non-MPI rosetta.

Tue, 2012-08-14 08:04
smlewis

If you meant rna_cluster.cc, it's in 3.4, so you'd need to upgrade.

Wed, 2012-08-08 13:15
smlewis