Rosetta 3.5
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Documentation for RNA 3D structure modeling: rna_denovo, rna_database, rna_extract, rna_score, rna_minimize, and rna_cluster applications
Author
Rhiju Das

Metadata

Written in 2008. Last update: Nov. 2011 by Rhiju Das (rhiju [at] stanford.edu).

Code and Demo

The central code for the rna_denovo application is in src/protocols/rna/RNA_DeNovoProtocol.cc.

For a 'minimal' demo example of the RNA fragment assembly and full-atom minimization protocol and input files, see

rosetta_demos/public/RNA_Denovo

References

Das, R. and Baker, D. (2007), "Automated de novo prediction of native-like RNA tertiary structures", PNAS 104: 14664-14669. [for fragment assembly]. Paper. Link.

Das, R., Kudaravalli, M., et al. (2007) "Structural inference of native and partially folded RNA by high throughput contact mapping", PNAS, 4144-4149. [for modeling large RNAs with constraints]. Paper. Link.

Das, R., Karanicolas, J., and Baker, D. (2010), "Atomic accuracy in predicting and designing noncanonical RNA structure". Nature Methods 7:291-294. [for high resolution refinement] Paper. Link.

Sripakdeevong, P., Kladwang, W., and Das, R. (2011) "An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling", PNAS 108:20573-20578. [for loop modeling] Paper Link

New_Library contains code for this purpose

This code is intended to give three-dimensional de novo models of single-stranded RNAs or multi-stranded RNA motifs, with the prospect of reaching high (near-atomic-resolution) accuracy.

Algorithm

The RNA structure modeling algorithm in Rosetta is based on the assembly of short (1 to 3 nucleotide) fragments from existing RNA crystal structures whose sequences match subsequences of the target RNA. The Fragment Assembly of RNA (FARNA) algorithm is a Monte Carlo process, guided by a low-resolution knowledge-based energy function. The models can then be further refined in an all-atom potential to yield more realistic structures with cleaner hydrogen bonds and fewer clashes; the resulting energies are also better at discriminating native-like conformations from non-native conformations. The two-step protocol has been named FARFAR (Fragment Assembly of RNA with Full Atom Refinement).

Limitations

Modes

Input Files

Required file

You need only one input file to run RNA structure modeling:

Optional

How to run with this file.

A sample command line is the following:

rna_denovo.<exe> -fasta chunk002_1lnt_.fasta -nstruct 2 -out::file::silent test.out -cycles 1000
-minimize_rna -database <path to database>

The code takes about 1 minute to generate two models.

The fasta file has the RNA name on the first line (after >), and the sequence on the second line. Valid letters are a,c,g, and u. The example fasta file is available in rosetta_source/test/integration/tests/rna_denovo/.

Parameter files (".params") to specify Watson/Crick base pairs and strand boundaries

RNA motifs are typically ensconced within Watson/Crick double helices, and involve several strands. [The most conserved loop of the signal recognition particle is an example, and is included here as chunk002_1lnt_RNA.pdb.] You can specify the bounding Watson/Crick base pairs in a "params file" with lines like the following:

CUTPOINT_OPEN 6    [means that one chain ends at residue 6]
STEM PAIR 1 12 W W A    [means that residues 1 and 12 should form a base pair with their Watson-Crick edges in
an antiparallel orientation]

and then run:

rna_denovo.<exe> -fasta chunk002_1lnt_.fasta -native chunk002_1lnt_RNA.pdb -params_file chunk002_1lnt_.prm -nstruct 2
-out::file::silent chunk002_1lnt.out -cycles 1000 -minimize_rna -database <path to database>

This command line also includes the "native" pdb, and will result in heavy-atom rmsd scores being calculated. Note again that the native pdb should have residues marked rA, rC, rG, and rU (see notes on PDB below). The code again takes about 1 minute to generate two models. Finally, there are some notes on forcing other kinds of pairs below [Can I specify non-Watson-Crick pairs?].

Use Of Alternative Fragment Sources

By default the RNA fragment assembly makes use of bond torsions derived from the large ribosome subunit crystal structure 1jj2, which have been pre-extracted in 1jj2. torsions (available in the database). If you want to use torsions drawn from a separate PDB (or set of PDBs), the following command will do the job.

rna_database.<exe>  -vall_torsions -s my_new_RNA1.pdb my_new_RNA2.pdb -o my_new_set.torsions
-database <path to database>

The resulting file is just a text file with the RNA's torsion angles listed for each residue. Then, when creating models, use the following flag with the rna_denovo application:

-vall_torsions my_new_set.torsions

Similarly, the database of base pair geometries can be created with rna_database -jump_library, and then specified in the rna_denovo application with -jump_library_file.

Options

Required:
-in:database                                     Path to rosetta databases. [PathVector]
-in:fasta                                        Fasta-formatted sequence file. [FileVector]

Commonly used:
-out:file:silent                                 Name of output file [scores and torsions, compressed format]. default="default.out" [String]
-params_file                                     RNA params file name.[String]. For Example: -params_file chunk002_1lnt_.prm
-in:native                                       Native PDB filename. [File].
-out:nstruct                                     Number of models to make. default: 1. [Integer]
-minimize_rna                                    High resolution optimize RNA after fragment assembly.[Boolean]
-vary_geometry                                   Vary bond lengths and angles (with harmonic constraints near Rosetta ideal) for backbone and sugar degrees of freedom [Boolean]

Less commonly used, but useful
-cycles                                          Number of Monte Carlo cycles.[default 10000]. [Integer]
-filter_lores_base_pairs                         Filter for models that satisfy structure parameters. [Boolean]
-output_lores_silent_file                        If high resolution minimizing, output intermediate low resolution models. [Boolean]
-dump                                            Generate pdb output. [Boolean]
-vall_torsions                                   Source of RNA fragments. [default: 1jj2.torsions]. [Boolean]
-jump_library_file                               Source of base-pair rigid body transformations if base pairs are specified.
                                                   [default: 1jj2_RNA_jump_library.dat] [String]
-close_loops                                     Attempt closure across chainbreaks by cyclic coordinate descent after fragment moves [Boolean]
-cst_file                                        Specify constraints (typically atom pairs) in Rosetta-style constraint file. [String]
-output_lores_silent_file                        if doing full-atom minimize, also save models after fragment assembly but before refinement (file will be called *.LORES.out) [Boolean]
-dump                                            output pdbs that occur during the run, even if using silent file output.

Advanced [used in rna_assembly]
-in:file:silent                                  List of input files (in 'silent' format) that specify potential template structures or 'chunks'
-chunk_res                                       Positions at which 'chunks' are applied. If there is more than one chunk file, specify indices for
                                                   the first file and then the second file, etc.

Tips

Note on PDB format for RNA

Input and output PDB models have residues marked rA, rC, rG, and rU, due to historical reasons. If you have a "standard" PDB file, there is a python script available to convert it to Rosetta format:

rosetta_tools/rna/make_rna_rosetta_ready.py <pdb file>

Can I specify non-Watson-Crick pairs?

You can also specify base pairs that must be forced, even at the expense of creating temporary chainbreaks, in the params file, with a line like:

OBLIGATE PAIR 2 11 W W A

This also allows the specification of non-Watson-Crick base pairs. In the line above, you can change the W's to H (hoogsteen edge) or S (sugar edge); and the A to P (antiparallel to parallel). The base edges are essentially the same as those defined in the classification by Leontis & Westhof. The latter (A/P) are determined by the relative orientation of base normals. [The cis/trans classification of Leontis & Westhof would be an alternate to the A/P, but we found A/P more convenient to compute and to visually assess.] The base pairs are drawn from a library of base pairs extracted from the crystallographic model of the large ribosomal subunit 1JJ2.

When specifying pairs, if there are not sufficient CUTPOINT_OPEN's to allow all the pairs to form, the code will attempt to choose a (non-stem) RNA suite to put in a cutpoint, which can be closed during fragment assembly with the -close_loops option. If you want to pre-specify where this cutpoint will be chosen, add a line like

CUTPOINT_CLOSED 6

What do the scores mean?

The most common question we get is on what the terms in the 'SCORE lines' of silent files mean. Here's a brief rundown, with more explanation in the papers cited above.

***Energy interpreter for low resolution silent output:
score                                            Final total score
rna_rg                                           Radius of gyration for RNA
rna_vdw                                          Low resolution clash check for RNA
rna_base_backbone                                Bases to 2'-OH, phosphates, etc.
rna_backbone_backbone                            2'-OH to 2'-OH, phosphates, etc.
rna_repulsive                                    Mainly phosphate-phosphate repulsion
rna_base_pair_pairwise                           Base-base interactions (Watson-Crick and non-Watson-Crick)
rna_base_pair                                    Base-base interactions (Watson-Crick and non-Watson-Crick)
rna_base_axis                                    Force base normals to be parallel
rna_base_stagger             Force base pairs to be in same plane
rna_base_stack                                   Stacking interactions
rna_base_stack_axis                              Stacking interactions should involve parallel bases.
atom_pair_constraint                             Harmonic constraints between atoms involved in Watson-Crick base
                                                 pairs specified by the user in the params file
rms                                              all-heavy-atom RMSD to the native structure

***Energy interpreter for fullatom silent output:
score                                            Final total score
fa_atr                                           Lennard-jones attractive between atoms in different residues
fa_rep                                           Lennard-jones repulsive between atoms in different residues
fa_intra_rep                                     Lennard-jones repulsive between atoms in the same residue
lk_nonpolar                                      Lazaridis-karplus solvation energy, over nonpolar atoms
hack_elec_rna_phos_phos                          Simple electrostatic repulsion term between phosphates
hbond_sr_bb_sc                                   Backbone-sidechain hbonds close in primary sequence
hbond_lr_bb_sc                                   Backbone-sidechain hbonds distant in primary sequence
hbond_sc                                         Sidechain-sidechain hydrogen bond energy
ch_bond                                          Carbon hydrogen bonds
geom_sol                                         Geometric Solvation energy for polar atoms
rna_torsion                                      RNA torsional potential.
atom_pair_constraint                             Harmonic constraints between atoms involved in Watson-Crick base pairs
                                                 specified by the user in the params file
angle_constraint                                 (not in use)

N_WC                                             number of watson-crick base pairs
N_NWC                                            number of non-watson-crick base pairs
N_BS                                             number of base stacks

[Following are provided if the user gives a native structure for reference]
rms                                              all-heavy-atom RMSD to the native structure
rms_stem                                         all-heavy-atom RMSD to helical segments in the native structure, defined by 'STEM' entries in the parameters file.
f_natWC                                          fraction of native Watson-Crick base pairs recovered
f_natNWC                                         fraction of native non-Watson-Crick base pairs recovered
f_natBP                                          fraction of base pairs recovered

How do I just score?

To get a score of an input PDB, you can run the 'denovo' protocol but ask there to be no fragment assembly cycles and no rounds of minimization:

rna_score.<exe> -database <path to database>  -s <pdb file> [<pdb file 2> ...] -out:file:silent SCORE.out  [-native <native pdb>]

If you want to minimize under the low resolution RNA potential (used in FARNA), add the flag '-score:weights rna_lores.wts'. Then you can check the score in SCORE.out:

 grep SCORE SCORE.out

But this is not recommended if you are trying to score a model deposited in the PDB or created by other software – see next .

How do I just minimize?

If you take a PDB created outside Rosetta, very small clashes may be strongly penalized by the Rosetta all-atom potential. Instead of scoring, you should probably do a short minimize, run:

rna_minimize.<exe> -database <path to database>  -s <pdb file> [<pdb file 2> ...] -out:file:silent MINIMIZE.out  [-native <native pdb>]

If you want to minimize under the low resolution RNA potential (used in FARNA), add the flag '-score:weights rna_lores.wts'. Then check out the scores in MINIMIZE.out.

 grep SCORE MINIMIZE.out

You can extract models from silent files as described in [Extraction Of Models Into PDB Format], but you'll also get models with the same names as your input with the suffix '_minimize.pdb'.

Other options

Check this section: Documentation for RNA assembly with experimental pair-wise constraints, using rna_denovo and rna_helix executables.

Expected Outputs

You will typically use the protocol to produce a silent file – how do you get the models out?

Post Processing

Extraction Of Models Into PDB Format

The models from the above run are stored in compressed format in the file test.out, along with lines representing the score components. You can see the models in PDB format with the conversion command.

rna_extract.<exe>  -in:file:silent test.out -in:file:silent_struct_type  rna -database <path to database>

Note that the PDBs have residue types marked as rA, rC, rG, and rU.

How can I cluster models?

There is one executable for clustering, it currently requires that all the models be in a silent file and have scores. (If you don't have such a silent file, use the rna_score executable described in How do I just score?). Here's the command line:

 rna_cluster.<exe>   -database <path to database>    -in:file:silent <silent file with models> -out:file:silent <silent file with clustered models>   [-cluster:radius <rmsd threshold>] [-nstruct <maximum number of clusters>]

The way this clustering works is it simply goes through the models in order of energy, and if a model is more than the rmsd threshold than the existing clusters, it spawns a new cluster.

New things since last release

Added applications rna_minimize, rna_helix, rna_cluster. Updated torsional potential to be smooth.