Rosetta 3.3
Documentation for RNA 3D structure modeling: rna_denovo, rna_database, and rna_extract applications
Author:
Rhiju Das

Metadata

This document updates documentation written in 2008 by Rhiju Das (rhiju [at] stanford.edu) into the latest documentation format. Last update: April 2011.

Code and Demo

The central code for the rna_denovo application is in src/protocols/rna/RNA_DeNovoProtocol.cc.

For a 'minimal' demo example of the RNA fragment assembly and full-atom minimization protocol and input files, see

test/integration/tests/rna_denovo/ [in the developer's SVN repository]

or

rosetta_demos/RNA_Denovo [in the public release]

References

Das, R. and Baker, D. (2007), "Automated de novo prediction of native-like RNA tertiary structures", PNAS 104: 14664-14669. [for fragment assembly]

Das, R., Kudaravalli, M., et al. (2007) "Structural inference of native and partially folded RNA by high throughput contact mapping", Proceedings of the National Academy of Sciences U.S.A 105, 4144-4149. [for modeling large RNAs with contraints]

Das, R., Karanicolas, J., and Baker, D. (2010), "Atomic accuracy in predicting and designing noncanonical RNA structure". Nature Methods 7:291-294. [for high resolution refinement]

Sripakdeevong, P., Kladwang, W., and Das, R. (2011), "Resolving a sampling bottleneck in biopolymer structure prediction: RNA loops by a stepwise ansatz", submitted. [for loop modeling]

(Preprints/reprints of these papers are available at http://daslab.stanford.edu/pubs.html).

Purpose

This code is intended to give three-dimensional de novo models of single-stranded RNAs or multi-stranded RNA motifs, with the prospect of reaching high (near-atomic-resolution) accuracy.

Algorithm

The RNA structure modeling algorithm in Rosetta is based on the assembly of short (1 to 3 nucleotide) fragments from existing RNA crystal structures whose sequences match subsequences of the target RNA. The Fragment Assembly of RNA (FARNA) algorithm is a Monte Carlo process, guided by a low-resolution knowledge-based energy function. The models can then be further refined in an all-atom potential to yield more realistic structures with cleaner hydrogen bonds and fewer clashes; the resulting energies are also better at discriminating native-like conformations from non-native conformations. The two-step protocol has been named FARFAR (Fragment Assembly of RNA with Full Atom Refinement).

Limitations

Modes

Input Files

Required file

You need only one input file to run RNA structure modeling:

Optional additional files:

How to include these files.

A sample command line is the following:

rna_denovo.<exe> -fasta chunk002_1lnt_.fasta -nstruct 2 -out::file::silent test.out -cycles 1000
-minimize_rna -database <path to database>

The code takes about 1 minute to generate two models.

The fasta file has the RNA name on the first line (after >), and the sequence on the second line. Valid letters are a,c,g, and u. The example fasta file is available in rosetta_source/test/integration/tests/rna_denovo/.

Parameter files (".params") to specify Watson/Crick base pairs and strand boundaries

RNA motifs are typically ensconced within Watson/Crick double helices, and involve several strands. [The most conserved loop of the signal recognition particle is an example, and is included here as chunk002_1lnt_RNA.pdb.] You can specify the bounding Watson/Crick base pairs in a "params file" with lines like the following:

CUTPOINT_OPEN 6    [means that one chain ends at residue 6]
STEM PAIR 1 12 W W A    [means that residues 1 and 12 should form a base pair with their Watson-Crick edges in
an antiparallel orientation]

and then run:

rna_denovo.<exe> -fasta chunk002_1lnt_.fasta -native chunk002_1lnt_RNA.pdb -params_file chunk002_1lnt_.prm -nstruct 2
-out::file::silent chunk002_1lnt.out -cycles 1000 -minimize_rna -database <path to database>

This command line also includes the "native" pdb, and will result in heavy-atom rmsd scores being calculated. Note again that the native pdb should have residues marked rA, rC, rG, and rU (see notes on {DB below). The code again takes about 1 minute to generate two models. Finally, there are some notes on forcing other kinds of pairs below [Can I specify non-Watson-Crick pairs?].

Use Of Alternative Fragment Sources

By default the RNA fragment assembly makes use of bond torsions derived from the large ribosome subunit crystal structure 1jj2, which have been pre-extracted in 1jj2. torsions (available in the database). If you want to use torsions drawn from a separate PDB (or set of PDBs), the following command will do the job.

rna_database.<exe>  -vall_torsions -s my_new_RNA1.pdb my_new_RNA2.pdb -o my_new_set.torsions
-database <path to database>

The resulting file is just a text file with the RNA's torsion angles listed for each residue. Then, when creating models, use the following flag with the rna_denovo application:

-vall_torsions my_new_set.torsions

Similarly, the database of base pair geometries can be created with rna_database -jump_library, and then specified in the rna_denovo application with -jump_library_file.

Options

Required:
-in:database                                     Path to rosetta databases. [PathVector]
-in::fasta                                       Fasta-formatted sequence file. [FileVector]

Commonly used:
-out::file::silent                               Name of output file [scores and torsions, compressed format]. default="default.out" [String]
-params_file                                     RNA params file name.[String]. For Example: -params_file chunk002_1lnt_.prm
-in::native                                      Native PDB filename. [File].
-out::nstruct                                    Number of models to make. default: 1. [Integer]
-minimize_rna                                    High resolution optimize RNA after fragment assembly.[Boolean]
-vary_geometry                                   Vary bond lengths and angles (with harmonic constraints near Rosetta ideal) for backbone and sugar degrees of freedom [Boolean]

Less commonly used, but useful
-cycles                                          Number of Monte Carlo cycles.[default 10000]. [Integer]
-filter_lores_base_pairs                         Filter for models that satisfy structure parameters. [Boolean]
-output_lores_silent_file                        If high resolution minimizing, output intermediate low resolution models. [Boolean]
-dump                                            Generate pdb output. [Boolean]
-vall_torsions                                   Source of RNA fragments. [default: 1jj2.torsions]. [Boolean]
-jump_library_file                               Source of base-pair rigid body transformations if base pairs are specified.
                                                 [default: 1jj2_RNA_jump_library.dat] [String]
-close_loops                                     Attempt closure across chainbreaks by cyclic coordinate descent after fragment moves [Boolean]
-cst_file                                        Specify constraints (typically atom pairs) in Rosetta-style constraint file. [String]
-output_lores_silent_file                        if doing full-atom minimize, also save models after fragment assembly but before refinement (file will be called *.LORES.out) [Boolean]
-dump                                            output pdbs that occur during the run, even if using silent file output.

Tips

Note on PDB format for RNA

Input and output PDB models have residues marked rA, rC, rG, and rU, due to historical reasons. If you have a "standard" PDB file, there is a python script available to convert it to Rosetta format:

demo/rna/make_rna_rosetta_ready.py <pdb file>

Can I specify non-Watson-Crick pairs?

You can also specify base pairs that must be forced, even at the expense of creating temporary chainbreaks, in the params file, with a line like:

OBLIGATE PAIR 2 11 W W A

This also allows the specification of non-Watson-Crick base pairs. In the line above, you can change the W's to H (hoogsteen edge) or S (sugar edge); and the A to P (antiparallel to parallel). The base edges are essentially the same as those defined in the classification by Leontis & Westhof. The latter (A/P) are determined by the relative orientation of base normals. [The cis/trans classification of Leontis & Westhof would be an alternate to the A/P, but we found A/P more convenient to compute and to visually assess.] The base pairs are drawn from a library of base pairs extracted from the crystallographic model of the large ribosomal subunit 1JJ2.

When specifying pairs, if there are not sufficient CUTPOINT_OPEN's to allow all the pairs to form, the code will attempt to choose a (non-stem) RNA suite to put in a cutpoint, which can be closed during fragment assembly with the -close_loops option. If you want to pre-specify where this cutpoint will be chosen, add a line like

CUTPOINT_CLOSED 6

What do the scores mean?

The most common question we get is on what the terms in the 'SCORE lines' of silent files mean. Here's a brief rundown, with more explanation in the papers cited above.

***Energy interpreter for low resolution silent output:
score     		                                   Final total score
rna_rg                                           Radius of gyration for RNA
rna_vdw                                          Low resolution clash check for RNA
rna_base_backbone                                Bases to 2'-OH, phosphates, etc.
rna_backbone_backbone                            2'-OH to 2'-OH, phosphates, etc.
rna_repulsive                                    Mainly phosphate-phosphate repulsion
rna_base_pair_pairwise                           Base-base interactions (Watson-Crick and non-Watson-Crick)
rna_base_pair                                    Base-base interactions (Watson-Crick and non-Watson-Crick)
rna_base_axis                                    Force base normals to be parallel
rna_base_stagger    				 Force base pairs to be in same plane
rna_base_stack                                   Stacking interactions
rna_base_stack_axis                              Stacking interactions should involve parallel bases.
atom_pair_constraint                             Harmonic constraints between atoms involved in Watson-Crick base
                                                 pairs specified by the user in the params file
rms                                              all-heavy-atom RMSD to the native structure

***Energy interpreter for fullatom silent output:
score                                            Final total score
fa_atr                                           Lennard-jones attractive between atoms in different residues
fa_rep                                           Lennard-jones repulsive between atoms in different residues
fa_intra_rep                                     Lennard-jones repulsive between atoms in the same residue
lk_nonpolar                                      Lazaridis-karplus solvation energy, over nonpolar atoms
hack_elec_rna_phos_phos                          Simple electrostatic repulsion term between phosphates
hbond_sr_bb_sc                                   Backbone-sidechain hbonds close in primary sequence
hbond_lr_bb_sc                                   Backbone-sidechain hbonds distant in primary sequence
hbond_sc                                         Sidechain-sidechain hydrogen bond energy
ch_bond                                          Carbon hydrogen bonds
geom_sol                                         Geometric Solvation energy for polar atoms
rna_torsion                                      RNA torsional potential.
atom_pair_constraint                             Harmonic constraints between atoms involved in Watson-Crick base pairs
                                                 specified by the user in the params file
angle_constraint                                 (not in use)
rms                                              all-heavy-atom RMSD to the native structure

How do I just score?

To get a score of an input PDB, you can run the 'denovo' protocol but ask there to be no fragment assembly cycles and no rounds of minimization:

rna_denovo.<exe> -database <path to database> -cycles 0 -minimize_rna  -minimize_rounds 0  -s <your pdb file>  -fasta <the sequence of the pdb> -out:file:silent SCORE.out

Then you can check the score in SCORE.out:

 grep SCORE SCORE.out

How do I just minimize?

If you take a PDB created outside Rosetta, very small clashes may be strongly penalized by the Rosetta all-atom potential. Instead of scoring, you should probably do a short minimize, run:

rna_denovo.<exe> -database <path to database> -cycles 0 -minimize_rna  -s <your pdb file> -fasta <fasta with sequence of the pdb> -out:file:silent MINIMIZE.out

Then grep SCORE MINIMIZE.out

You will have to change the "-out:file:silent <file>" flag for each input file, or you will get a message that the job is already done. This is admittedly cumbersome; future releases will include a separate executable for minimizing.

Expected Outputs

You will typically use the protocol to produce a silent file -- how do you get the models out?

Post Processing

Extraction Of Models Into PDB Format

The models from the above run are stored in compressed format in the file test.out, along with lines representing the score components. You can see the models in PDB format with the conversion command.

rna_extract.<exe>  -in:file:silent test.out -in:file:silent_struct_type  rna -database <path to database>

Note that the PDBs have residue types marked as rA, rC, rG, and rU.

New things since last release

The code has not been changed since the first release (Rosetta 3.0), but the code was removed in release 3.3 because the documentation was not upgraded to the Rosetta community standards. Rosetta 3.4 onwards restores rna_denovo with proper documentation!

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines