March 2012 by Parin Sripakdeevong (sripakpa [at] stanford.edu) and Rhiju Das (rhiju [at] stanford.edu).
This code builds single-stranded RNA loops using a deterministic, enumerative sampling method called Stepwise Assembly. A demo example for building the first (5' most) nucleotide of a 6-nucleotide loop is given in
rosetta_demos/public/SWA_RNA_Loop/. The same protocol can then be recursively applied to build the remaining nucleotides in the loop, one individual nucleotide at a time.
(Note: the Stepwise Assembly method constructs full-length RNA loops through the recursive building of each individual RNA nucleotides over multiple steps. The enumerative nature of the method makes the full-calculation computationally expensive, requiring for example 15,000 CPU hours to build a single 6-nucleotide RNA loop. While this full-calculation is now feasible on a high-performance computer clusters, perform the full-calculation in the demo would be too excessive.)
The central codes are located in the
src/protocols/swa_rna/ folder. The applications are in apps/public/swa_rna_main and apps/public/swa_rna_util
Sripakdeevong, P., Kladwang, W. & Das, R. (2012), “An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling”, Proc Natl Acad Sci USA. doi:10.1073/pnas.1106516108
This method builds single-stranded RNA loops using a deterministic, enumerative sampling method called Stepwise Assembly. The modeling situation considered here is the lock-and-key problem. Given a template PDB that contains nucleotides surrounding a missing RNA loop, the Stepwise Assembly method finds the loop conformation (the key) that best fits the surrounding structure (the lock).
There is only one mode to run SWA_RNA_Loop at present.
You need two files:
Input PDB file can be converted into the Rosetta RNA PDB format using the following command:
rosetta_tools/SWA_RNA_python/SWA_dagman_python/misc/SWA_extract_pdb.py IN_PDB.pdb -output_pdb OUT_PDB.pdb
Replace 'IN_PDB.pdb' with your input PDB filename. Replace 'OUT_PDB.pdb' with the filename you want the converted PDB to be outputted to.
The SWA_RNA_python package located at rosetta_tools/SWA_RNA_python/ contains the scripts necessary to setup and run the Stepwise Assembly protocol. Instructions are provided in steps 1)-4) below:
1) Specify the location of the rosetta bin folder and rosetta database folder by editing the file rosetta_tools/SWA_RNA_python/SWA_dagman_python/utility/USER_PATHS.py
For example, if the main rosetta folder is located at ~/rosetta/, then the file should look as follow:
#!/usr/bin/python USER_ROSETTA_BIN_FOLDER="~/rosetta/rosetta_source/bin/" USER_ROSETTA_DATABASE_FOLDER="~/rosetta/rosetta_database/"
2) Add the SWA_RNA_python package location to the PYTHON path. For bash shell users, the location can be directly added to the ~/.bashrc file:
3) After the paths are correctly specified, the following command is used to setup everything needed run the Stepwise Assembly job:
rosetta_tools/SWA_RNA_python/SWA_dagman_python/SWA_DAG/setup_SWA_RNA_dag_job_files.py -s template.pdb -fasta fasta -sample_res 3-8 -single_stranded_loop_mode True -local_demo True -native_pdb native.pdb
The "-s" flag specifies the @ref template_PDB file The "-fasta" flag specifies the @ref fasta file The "-sample_res" flag specifies the sequence number of nucleotides in the missing loop. For example, 3-8 means that the missing loop nucleotides are located at sequence position 3 4 5 6 7 and 8. The "-single_stranded_loop_mode" flag specifies that the job involve modeling a single-stranded loop (i.e. the lock-and-key problem). The "-local_demo" flag indicate that this is demo to be run on a local laptop or desktop. The calculation perform here is to only build the first (5' most) nucleotide of the 6-nucleotides RNA loop. The "-native_pdb" flag specifies the @ref native_PDB file and is optional.
4) Type "source LOCAL_DEMO" to execute the Rosetta protocol.
The provided instruction will allow the user to build the first (5' most) nucleotide of a N-nucleotide loop. As previously stated, the full-calculation to build full-length RNA loops is quite computationally expensive and is beyond the scope of this documentation. The SWA_RNA_python package is, however, equipped to run this recursive full-calculation on a high-performance computer clusters. The package utilize concept familiar from the Map/Reduce Direct Acyclic Graph framework to order the calculation steps and allocate resources to recursive build the full-length RNA loop over multiple steps, one individual RNA nucleotide at a time. If any user is interested, please contact Parin Sripakdeevong (sripakpa [at] stanford.edu) and we will be happy to provide additional instructions.
The expected outputs are two silent_files:
A) region_0_1_sample.out: This silent_file contain 108 structures, corresponding to the 108 lowest energy conformations. B) region_0_1_sample.cluster.out: Same as A) but after clustering of the models to remove redundant conformations.
In both silent_files, the total energy score is found under the 'score' column. If the "native_pdb" flag was included, then the RMSD (in angstrom units) between the native_pdb and each Rosetta model is found under the 'NAT_rmsd' column.
Finally, use the following command to extract the top 5 energy cluster centers:
rosetta_tools/SWA_RNA_python/SWA_dagman_python/misc/SWA_extract_pdb.py -tag S_0 S_1 S_2 S_3 S_4 -silent_file region_0_1_sample.cluster.out
After running the command, the extracted PDB files should appear in the pose_region_0_1_sample.cluster.out/ subfolder.
This application is new as of Rosetta 3.4.