Author: Fang-Chieh Chou
Mar. 2012 by Fang-Chieh Chou (fcchou [at] stanford.edu).
The full ERRASER pipeline is controlled by a set of python codes in
src/apps/public/ERRASER/ . The main applications being used are erraser_minimizer , swa_rna_analytical_closure and swa_rna_main . The central codes for SWA (StepWise Assembly) applications are in
src/protocols/stepwise/legacy/rna/ . The electron density scoring function used in ERRASER is in
For a minimal demonstration of ERRASER, see:
Chou, F.C., Sripakdeevong, P., Dibrov, S.M., Hermann, T., and Das, R. Correcting pervasive errors in RNA crystallography with Rosetta, arXiv:1110.0276. [For ERRASER. To be published] Preprint
Sripakdeevong, P., Kladwang, W., and Das, R. (2011) "An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling", PNAS 108:20573-20578. [For stepwise assembly algorithm (SWA)] Paper Link
This code is used for improving a given RNA crystallographic model and reduce the number of potential errors in the model (which can evaluated by Molprobity), under the constraint of experimental electron density map.
This method pipelines Rosetta full-atom mimization and stepwise assembly rebuilding for single residue to improve a given RNA crystallographic model. Electron density score is used to constrain the model during the modeling.
ERRASER works only for RNA currently. Other parts in crystallographic model, including proteins, modified bases and ligands, are not being modeled. Remodeling of RNA residues that are in close contact with these components may be problematic. We are planning to tackle these issues in the future, but for now ERRASER seems to be work well for most RNA residues. Residues in close contact with non-RNA components can also be held fixed in ERRASER to avoid problematic rebuilding.
Currently crystal contacts are not being modeled, which is known to cause problems in a few test cases when RNA is interacting strongly with its crystal-packing partner (ex. base-pairing and base-stacking). Right now this problem can be resolved by mannually add the crystal-packing partner into the starting pdb file. We are planning to model crystal-packing in the future.
The PHENIX refinement package is required for the ERRASER pipeline. The users can download PHENIX from http://www.phenix-online.org (free for academic usage)
There is only one mode to run ERRASER at present.
You need two files:
The starting structure in standard pdb format. The ERRASER directly takes the standard pdb file and convert it to Rosetta format automatically, therefore no pre-processing is required.
A CCP4 electron density map file. This can be created by PHENIX or other refinement packages. The input map must be a CCP4 2mFo-DFc map. To avoid overfitting, Rfree reflection should be excluded during the creation of the map file.
Prior to running ERRASER, the following setup is required:
Ensure you have correctly setup PHENIX. As a check, run the following command and see if it works:
Check if you have the latest python (v2.7) installed. If not, go to the
rosetta/rosetta_tools/ERRASER/ folder and run
This will change the default python used by the code to phenix-built-in python, instead of using system python.
Set up the environmental variable "\$ROSETTA", point it to the Rosetta folder. If you use bash, append the following lines to
ROSETTA=<YOUR_ROSETTA_PATH>; export ROSETTA
Also add the ERRASER script folder to \$PATH. Here is a bash example:
Now you are ready to go!
ERRASER can be simply run with the python script
erraser.py in the
rosetta_tools/ERRASER/ directory. If you followed the setup instruction above, you should now be able to run ERRASER directly from command line:
erraser.py -pdb 1U8D_cut.pdb -map 1U8D_cell.ccp4 -map_reso 1.95 -fixed_res A33-37 A61 A65
The first two arguments are required – the input pdb file and the CCP4 map file. The last two arguments are optional; they supply the map resolution and the residues need to be fixed during rebuilding.
You can see examples of the output pdb file in
The above workflow should work, but its worth looking at the rosetta command-lines called by the python scripts to see what's going on.
The minimization step:
erraser_minimizer.<exe> -database <path to database> -native <input pdb> -out_pdb <output pdb> -score::weights rna/rna_hires_elec_dens -score:rna_torsion_potential RNA09_based_2012_new -vary_geometry true -fixed_res <fixed residue list> -edensity:mapfile <map file> -edensity:mapreso 2.0 -edensity:realign no
The rebuilding step with loop closure:
swa_rna_analytical_closure.<exe> -database <path to database> -algorithm rna_resample_test -s <input pdb> -native <native pdb> -out:file:silent blah.out -sampler_extra_syn_chi_rotamer true -sampler_cluster_rmsd 0.3 -native_edensity_score_cutoff 0.9 -sampler_native_rmsd_screen true -sampler_native_screen_rmsd_cutoff 2.0 -sampler_num_pose_kept 30 -PBP_clustering_at_chain_closure true -allow_chain_boundary_jump_partner_right_at_fixed_BP true -add_virt_root true -sample_res 2 -cutpoint_closed 2 -fasta fasta -input_res 1 3-4 -fixed_res 1 3-4 -jump_point_pairs NOT_ASSERT_IN_FIXED_RES 1-4 -alignment_res 1-4 -rmsd_res 4 -score:weights rna/rna_hires_elec_dens -edensity:mapfile <map file> -edensity:mapreso 2.0 -edensity:realign no -score:rna_torsion_potential RNA09_based_2012_new
The rebuilding step at terminal residue:
swa_rna_main.<exe> -database <path to database> -algorithm rna_resample_test -s <input pdb> -native <native pdb> -out:file:silent blah.out -sampler_extra_syn_chi_rotamer true -sampler_cluster_rmsd 0.3 -native_edensity_score_cutoff 0.9 -sampler_native_rmsd_screen true -sampler_native_screen_rmsd_cutoff 2.0 -sampler_num_pose_kept 30 -PBP_clustering_at_chain_closure true -allow_chain_boundary_jump_partner_right_at_fixed_BP true -add_virt_root true -sample_res 2 -cutpoint_closed 2 -fasta fasta -input_res 1-4 -fixed_res 2-4 -jump_point_pairs NOT_ASSERT_IN_FIXED_RES 1-4 -alignment_res 1-4 -rmsd_res 4 -score:weights rna/rna_hires_elec_dens -edensity:mapfile <map file> -edensity:mapreso 2.0 -edensity:realign no -score:rna_torsion_potential RNA09_based_2012_new
Below are a list of available arguments for
Required: -pdb Format: -pdb <input pdb> The starting structure in standard pdb format. -map Format: -map <map file> 2mFo-DFc map file in CCP4 format. Rfree should be excluded. Commonly used: -map_reso Format: -map_reso <float> / Default: 2.0 The resolution of the input density map. It is highly recommanded to input the map resolution whenever possible for better result. -out_pdb Format: -out_pdb <string> / Default: <input pdb name>_erraser.pdb. The user can output to other name using this option. -n_iterate Format: -n_iterate <int> / Default: 1 The number of rebuild-minimization iteration in ERRASER. The user can increase the number to achieve best performance. Usually 2-3 rounds will be enough. Alternatively, the user can also take a ERRASER-refined model as the input for a next ERRASER run to achieve mannual iteration. -fixed_res Format: -fixed_res <list> / Default: <empty> (Example: A1 A14-19 B9 B10-13 #chain ID followed by residue numbers) This allows users ton fix selected RNA residues during ERRASER. For example, because protein and ligands are not modeled in ERRASER, we recommand to fix RNA residues that interacts strongly with these unmodeled atoms. ERRASER will automatically detect residues covalently bonded to removed atoms and hold them fixed during the rebuild, but users need to specify residues having non-covalent interaction with removed atoms mannually. -kept_temp_folder Format: -kept_temp_folder <True/False> / Default: False Enable this option allows user to examine intermediate output files storing in the temp folder. The default is to remove the temp folder after job completion. Other: -rebuild_extra_res Format/Default: Same as -fixed_res This allows users to specify extra residues and force ERRASER to rebuild them. ERRASER will automatically pick out incorrect residues, but the user may be able to find some particular residues that was not fixed after one ERRASER run. The user can then re-run ERRASER with -rebuild_extra_res argument, and force ERRASER to remodel these residues. -cutpoint_open Format/Default: Same as -fixed_res This allows users to specify cutpoints (where the nucleotide next to it is not connected to itself) in the starting model. Since ERRASER will detect cutpoints in the model automatically, the users usually do not need to specify this option. -use_existing_temp_folder Format: -use_existing_temp_folder <True/False> / Default: True When is True, ERRASER will use any previous data stored in the existing temp folder and skip steps that has been done.Useful when the job stopped abnormally and the user try to re-run the same job. Disable it for a fresh run without using previously computed data. -rebuild_all Format: -rebuild_all <True/False> / Default: False When is True, ERRASER will rebuild all the residues instead of just rebuilding errorenous ones. Residues in "-fixed_res" (see below) are still kept fixed during rebuilding. It is more time consuming but not necessary leads to better result. Standard rebuilding with more iteration cycles is usually prefered. -native_screen_RMSD Format: -native_screen_RMSD <float> / Default: 2.0 In ERRASER default rebuilding, we only samples conformations that are within 2.0 A to the starting model (which is the "native" here). The user can modify the RMSD cutoff. If the value of native_screen_RMSD is larger than 10.0, the RMSD screening will be turned off.
At the end you will get a output pdb file in standard pdb format. The output file is in the standard PDB format and inherits all the ligands, metals and waters from the input pdb file. You can then further refine the output model directly using PHENIX or other refinement packages without any post-processing.
elec_dens_atomwiseis used in ERRASER. ERRASER also uses an updated rna torsional potential based on RNA09 dataset.