A collection of tools for PDB editing, cluster submission, 'silent-file' processing, and setting up rna_denovo and ERRASER jobs.
Under active development by the Das laboratory (Stanford).
This documentation describes how to set up code in:
Note: these setup instructions are for bash. If you're using a different shell, you will need to modify the commands accordingly.
Step 1: Add the following lines to your
.bashrc (may be
.bash_profile on some systems), replacing
YOUR_PATH_TO_ROSETTA with your actual path to Rosetta (for example:
export ROSETTA='YOUR_PATH_TO_ROSETTA' export RNA_TOOLS=$ROSETTA/tools/rna_tools/ export PATH=$RNA_TOOLS/bin/:$PATH export PYTHONPATH=$PYTHONPATH:$RNA_TOOLS/bin/
Step 2: Type
source ~/.bashrc (or if you edited
.bash_profile, then instead type
source ~/.bash_profile) to activate these paths & tools.
Step 3: Type
Step 4: Verify the setup by typing
rna_helix.py -h. This should print out usage instructions for
Following are example command lines for several of these Python-based tools:
To change the residue numbers and chains in a file:
renumber_pdb_in_place.py mymodel.pdb A:1-4 B:5-9
renumber_pdb_in_place.py mymodel.pdb 1-4 5-9
To generate a fasta file with the sequences from a PDB:
pdb2fasta.py mymodel.pdb [ > mymodel.fasta ]
To pull out a particular chain from a PDB file:
extract_chain.py mymodel.pdb A
To slice out particular residues from a PDB file:
pdbslice.py mymodel.pdb -subset 1-5 9-12 mysubset_
The last argument is a prefix for the sliced PDB file.
To excise particular residues from a PDB file:
pdbslice.py mymodel.pdb -excise 6-8 excised_
Again, the last argument is a prefix for the sliced PDB file.
'Silent files' use Rosetta's compressed file format that concatenates the scores for each model as well as the model coordinates (sometimes in a UU-encoded compressed format that looks like gobbledygook).
To quickly get the number that goes with each score column (useful before making scatterplots in
To find the lowest energy 20 models within the silent file and run Rosetta's
extract_pdbs executable to extract them:
extract_lowscore_decoys.py mysilentfile.out 20
To create a new silent file with just the lowest energy 20 models:
extract_lowscore_decoys_outfile.py mysilentfile.out -out 20
To concatenate several outfiles, renaming model tags to be unique:
cat_outfiles.py mysilentfile1.out mysilentfile2.out [ ... ]
To generate a near-ideal A-form RNA helix that has good Rosetta energy (requires that
rna_helix Rosetta executable is compiled):
rna_helix.py -seq aacg cguu -o myhelix.pdb [ -resnum A:5-8 A:20-23 -extension static.linuxgccrelease ]
extensionis the extension of your
rna_helixexecutable. To find this, type
ls $ROSETTA/main/source/bin/rna_helix*(requires that the
ROSETTAenvironmental variable is set, see Setup). This will print something like this to your screen:
The extension is everything that comes after
rna_helix., so here the extension would be
static.linuxgccrelease. If the executable is simply named
/YOUR_PATH_TO_ROSETTA/main/source/bin/rna_helix, then you do not need to provide the extension flag to
To strip out residues and HETATMs that are not recognizable as RNA from a PDB file:
Legacy [Following functionalities are directly available in
rna_denovo application now]: To prepare files for an RNA denovo (fragment assembly of RNA with full atom refinement, FARFAR) job:
See also: rna_denovo_setup.
rna_denovo_setup.py -fasta mysequence.fasta -secstruct_file mysecstruct.txt
These are largely geared towards clusters that the Das lab uses (stampede and other XSEDE resources; the BioX3 computer at Stanford), but are meant to be easily generalized to new ones.
To create submission files for a set of Rosetta command lines in a
rosetta_submit.py README OUTDIR 40 [number of hours]
OUTDIR will contain directories
39/ for 40 jobs. Your command-line should have a flag like
-out:file:silent mymodels.out, which will be elaborated into flags like
-out:file:silent OUTDIR/0/mymodels.out in the 40 jobs, so that theire outfiles will go into separate subdirectories and prevent file i/o conflicts.
Submission scripts for Condor, LSF, PBS, and SLURM (stampede) will show up in the directory along with (hopefully) a suggestion for how to run them on your cluster. Default number of hours is 16, but can be changed above. If you set up on a new cluster, please update rosetta_submit.py so that others can take advantage of your work.
While running or after running, bring together models from the different outfiles into a single silent file by: