The scripts and input files that accompany this demo can be found in the
demos/public directory of the Rosetta weekly releases.
KEYWORDS: UTILITIES GENERAL
We looked for a way to simultaneously minimize rosetta energy and keep all heavy atoms in a crystal structure as close as possible to their starting positions. As hard-won experience has shown, simply running relax on a structure will often move the backbone a few Angstroms. The best way we have found to perform the simultaneous optimization is to run relax with constraints always turned on (typically constraints ramp down in the late cycles of a relax run) and to constrain not just backbone but also sidechain atoms. This protocol has been tested on a benchmark set of 51 proteins and found to increase sequence recovery in enzyme design by 5% as compared with design in raw pdb structures. It accomplishes this with only .077 Angstrom RMSD over the set of proteins (C-alpha RMSD) from raw pdb to relaxed-with-csts pdb. A more complete description of the data leading to this protocol is below.
The required files are in:
Many pdbs have features, such as non-canonical amino acids, which will cause rosetta to fail. This script will be able to process most input pdbs so they are ready for a relax run. The script will "clean" structures to replace non-canonical amino acids with their closest counterparts:
rosetta/rosetta_source/src/apps/public/relax_w_allatom_cst/clean_pdb_keep_ligand.py python rosetta/rosetta_source/srsrc/apps/public/relax_w_allatom_cst/clean_pdb_keep_ligand.py your_structure_original.pdb -ignorechain
Relax with all-heavy-atom constraints is built into the relax application itself. If this is a new structure you may want to first clean it up using the above script. Relax proceeds as follows: (Note that these flags will not preserve any ligands in the structure unless the ligand params file is added.)
$> cp starting_inputs/1A99_1A99.pdb . $> $ROSETTA3/bin/relax.default.linuxgccrelease -s 1A99_1A99.pdb @starting_inputs/flags2 > log2.txt
These flags are required:
The flags2 file includes a set of recommended flags:
-relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints false
-ignore_unrecognized_res -relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints false -ex1 -ex2 -use_input_sc -correct -no_his_his_pairE -no_optH false -flip_HNQ
In general the short protocol is preferred for most applications, since this version is more complicated and the two give nearly identical results. In this protocol a separate script first generates sidechain atom constraints from an input pdb, then the relax protocol is run with this pre-generated constraitn file. The shorter protocol does this all in one step, and this longer version is largely deprecated. Certain users might prefer this protocol because it allows you to see a list of all constraints, and perhaps to modify constraints using other scripts/data, prior to relax.
(a) Generate sidechain coordinate constraints on your pdb using sidechain_cst_3.py:
$> cp starting_inputs/1A99_1A99.pdb . $> $ROSETTA3/src/apps/public/relax_w_allatom_cst/sidechain_cst_3.py 1A99_1A99.pdb 0.1 0.5 [output:1A99_1A99_sc.cst]
(b) Run relax using these constraints and with a custom relax script to force constraints to stay on during the entire run. Note that these flags will not preserve any ligands in the structure unless the ligand params file is added. If you want to keep the ligand, simply copy it over from the input pdb.
$> $ROSETTA3/bin/relax.default.linuxgccrelease -s 1A99_1A99.pdb -constraints:cst_fa_file 1A99_1A99_sc.cst @starting_inputs/flags > log.txt
note: Including extra rotamers is important if your goal is to keep all sidechain atoms tightly constrained; if that is not important for your applications, by all means exclude the ex1 and ex2.
-constrain_relax_to_start_coords -relax:script ../../../../rosetta_source/src/apps/public/relax_w_allatom_cst/always_constrained_relax_script -ignore_unrecognized_res -preserve_header -ex1 -ex2 -use_input_sc -correct -no_his_his_pairE -score::hbond_params correct_params -lj_hbond_hdis 1.75 -lj_hbond_OH_donor_dis 2.6 -linmem_ig 10 -nblist_autoupdate true -dun08 false -no_optH false -flip_HNQ -chemical:exclude_patches LowerDNA UpperDNA Cterm_amidation SpecialRotamer VirtualBB ShoveBB VirtualDNAPhosphate VirtualNTerm CTermConnect sc_orbitals pro_hydroxylated_case1 pro_hydroxylated_case2 ser_phosphorylated thr_phosphorylated tyr_phosphorylated tyr_sulfated lys_dimethylated lys_monomethylated lys_trimethylated lys_acetylated glu_carboxylated cys_acetylated tyr_diiodinated N_acetylated C_methylamidated MethylatedProteinCterm
To test protocols for relaxation of input pdbs we used the 51 scaffold test set for enzdes. We run design over the input structures and calculate the percent of residues which come back with the native identity -- sequence recovery, only over the designed residues. This is of course an imperfect metric (the original sequence might not be fully optimal), but it allows us to ask how many residues rosetta will correctly choose, assuming that the input structure is already at a minimum in sequence space for the ligand in question. It had already been found that relax alone (with no constraints) will distort most structures, and that those structures will give a much higher sequence recovery in design, but this is a result of the distortion to the input structure.
We decided to test a few protocols to find the one that would best minimize RMSD from the original pdb while maximizing sequence recovery. All calculations are averages over 50 runs of the 51 scaffold set. All RMSDs are to native C-alpha. It is also possible to read in electron density for a pdb and use that as a constraint during relax, but electron density is not uniformly available, and we were looking for a protocol that would work in every case even if the electron density was lacking. We chose the first protocol as the best because it minimized rmsd (and sidechain motions) while maximizing sequence recovery.
Running relax with sidechain coordinate constraints and bb coord constraints: (ex flags and use native; ex flags on in enzdes)
0.447 sequence recovery (0.077 RMSD) [-557 totalscore]
Running relax with sc-sc distance constraints at 3.5 distance cutoff. Note that this gets similar rmsd minimization but doesn't maximize sequence recovery or minimize score as well as coordinate constraints. We tested this protocol with a variety of sc-sc distance constraint cutoff values and found it to slightly but systematically underperform coordinate constraints:
0.436 sequence recovery (0.0706 RMSD) (-534 totalscore)
Running elax with backbone constraints only. Note that this does worse in terms of rmsd and that many sidechains are a few angstroms off:
0.488 sequences recovery (0.098 RMSD) (-633 totalscore)
No relax benchmark and rosetta scoring of native input structures:
0.40 (0 RMSD by definition) (-194.7 totalscore)
At this point, the astute reader might ask, what score terms became 438 Rosetta Energy Units (REU) better? We ranked the difference in scores over all structures, comparing the all-atom coordinate constraint protocol and the non-relaxed input structure. The biggest difference is fa_dun (-192.4), followed by fa_rep (-108.1), pro_close (-26.4), hbond_sc (-10.8) and omega (-8.5). Many input rotamers are close to but not in a "good" Dunbrack rotamer, and the backbone has to be slightly tweaked in order for that residue to get a good dunbrack score. Also many atoms are slightly too close, and they give the fa_rep contribution.