KEYWORDS: EXPERIMENTAL_DATA STRUCTURE_PREDICTION

Written October 26, 2010 Modified August 27, 2013 Modified June 24, 2016

This document briefly walks through the use of Rosetta to solve difficult molecular replacement problems. These tools assume that the user has access to the Phenix suite of crystallographic software (in particular, phaser and the mapbuilding script mtz2map); however, all intermediate files are included so that if the user does not, most of the demo may still be run.

The basic protocol is done in 5 steps; each step has a corresponding script in the folder:

Using HHSearch, find potential homology to the target sequence. Use a Rosetta "helper script" to prepare templates (and Rosetta inputs for subsequent computations).
Use PHASER to search for placement of the trimmed templates within the unit cell.
Generate a map correspoding to each putative MR solution.
Using Rosetta, rebuild gaps and refine each template/orientation in Rosetta, constrained by the density of each solution. After rescoring with PHASER, the best template/orientation should be clear (if the correct solution was among the starting models).

Step 1: prepare_template_for_MR.sh

This command-line illustrates the use of my script for preparing templates for an initial phaser run. Functionally, it's doing the same thing as the crystallographic software 'Sculptor' but it doesn't remap the residues as sculptor does (and makes it easier to run with different alignments). The script takes just one arguments: an HHR format alignment file.

Alignments generally come from HHsearch's web interface (http://toolkit.tuebingen.mpg.de/hhpred). After submitting the sequence through their website, export the results to a .hhr file. Results may be trimmed so only alignments with a reasonable e-value and sequence coverage are included.

The script parses the .hhr file, downloads each template PDB, and trims the PDB to the aligned residues. In addition, the script produces a 'rosetta-style' alignment file; the format is briefly introduced below. These alignment files are used in Rosetta model-building.

## 1CRB_ 2qo4b
# hhsearch
scores_from_program: 0 1.00
2 DFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTFRNYIMDFQVGKEFEEDLTGIDDRKCMTTVSWDGDKLQCVQKGEKEGRGWTQWIEGDELHLEMRAEGVTCKQVFKKV
0 AFSGTWQVYAQENYEEFLRAISLPEEVIKLAKDVKPVTEIQQNGSDFTITSKTPGKTVTNSFTIGKEAEIT--TMDGKKLKCIVKLDGGKLVCRTD----RFSHIQEIKAGEMVETLTVGGTTMIRKSKKI
--

The first line is '##' followed by a code for the target and one for the template. The second line identifies the source of the alignment; the third just keep as it is. The fourth line is the target sequence and the fifth is the template ... the number is an 'offset', identifying where the sequence starts. However, the number doesn't use the PDB resid but just counds residues starting at 0. The sixth line is '--'.

The results for this demo appear in the folder 'templates'. For each alignement in the starting .hhr file, 3 files are produced.

You can run the file either by running the provided .sh file or:

$> $ROSETTA3/src/apps/public/electron_density/prepare_template_for_MR.pl inputs/1crb.hhr

where $ROSETTA3=path-to-Rosetta/main/source

Steps 2 & 3: run_phaser.sh and make_maps.sh

This command line shows the use of Phaser to generate initial molecular replacement solutions. For each template we run phaser to find potential placements of each template in the unit cell.

NOTES these steps require havin PHENIX installed.

The example scripts here only generate a single model from a single template, but for a real-world case, one will often want to use many different templates and may want to generate more than one possible solution using 'TOPFILES n'. In general, though, we have found it is better to use fewer potential solutions from more templates than many solutions from few templates.

Sometimes weak hits may be found by lowering the rotation function cutoff in phaser by adding the line 'SELECT ROT FINAL PERCENT 0.65' (or even 0.5) to the phaser script. Increasing the packing function threshold (with PACK 10) may also help in some cases.

Finally, for each template/orientation, we generate the 2mfo-dfc map for input to Rosetta in the next step.

Steps 4A & 4B: run_rosetta_mr.sh

The final step illustrate the use of rosetta's comparative modeling into density. After running the script and an initial phaser run, density maps are generated from each phaser hit, and cm-into-density is done. The flag -MR::mode cm is used to run this mode. This first application does not try to rebuild gaps in the alignment, it just performs the threading and runs relax into density. Thus, the only inputs needed are: the target fasta file, the rosetta-style ali file, and the template pdb. Because there is no rebuilding, not many models are needed to adequately cover conformational space, generally 10-20 is sufficient.

This script is the same as above, but also rebuilds gaps in the alignment. The main difference is that a non-zero value is given for -MR::max_gaplength_to_model; additionally, some flags must be given that describe how rosetta should rebuild gaps.

<<<<<<< HEAD Several additional input files must be provided as well. Rebuilding of gaps is done by fragment insertion (as in Rosetta ab initio); thus two backbone fragment files (3-mers and 9-mers) must be given. The application for building these is included with rosetta but requires a bunch of external tools/databases. The easiest way to generate fragments is to use the Robetta server (http://robetta.bakerlab.org/fragmentsubmit.jsp). The fragment files

should be built with the full-length sequence; rosetta handles remapping the

Several additional input files must be provided as well. Rebuilding of gaps is done by fragment insertion (as in Rosetta ab initio); thus two backbone fragment files (3-mers and 9-mers) must be given. The application for building these is included with rosetta but requires a bunch of external tools/databases. The easiest way to generate fragments is to use the Robetta server -in::file::fasta inputs/1crb.fasta -in::file::alignment inputs/1crb_2qo4.ali -in::file::template_pdb inputs/2qo4.PHASER.1.pdb%0AThe%20fasta,%20alignment%20and%20template%20PDBs.%20%20See%20section%201%20for%20the%20input%20file%20format%20if%20it%20needs%20to%20be%20hand-edited.%0A%0A

  -edensity:mapfile inputs/sculpt_2QO4_A.PHASER.1.map
  -edensity:mapreso 3.0
  -edensity:grid_spacing 1.5

%0A%0AThis%20is%20how%20the%20density%20map%20and%20scorefunction%20parameters%20are%20given%20to%20Rosetta.%20%20The%20input%20map%20(-edensity:mapfile)%20is%20CCP4%20format.%20%20The%20flags%20'mapreso'%20defines%20the%20resolution%20of%20the%20calculated%20density;%20If%20the%20data%20is%20high-resolution%20it%20is%20often%20good%20to%20limit%20this%20to%202.5%20or%203.%20%20The%20grid%20spacing%20should%20be%20no%20more%20than%201/2%20the%20map%20resolution;%20if%20-MR::fast%20is%20used%20(see%20below),%20then%20the%20grid_spacing%20flag%20may%20be%20omitted%20(since%20more%20finely%20sampled%20grids%20will%20have%20much%20less%20of%20a%20speed%20penalty).%0A%0A

  -MR::cen_dens_wt 4.0
  -MR::fa_dens_wt 1.0

%0A%0AThis%20controls%20the%20weight%20on%20the%20experimental%20density%20data%20furing%20the%20two%20stages%20of%20refinement.%20%20The%20second%20flag%20(fa_dens_wt)%20has%20the%20greatest%20impact%20on%20the%20final%20models.%20%20If,%20after%20model%20generation%20and%20visual%20inspection,%20models%20don't%20seem%20to%20be%20fitting%20the%20density%20well%20(or%20overfitting%20to%20the%20density),%20this%20may%20be%20adusted%20accordingly.%20%20If%20omitted,%20the%20values%20shown%20abve%20are%20the%20defaults%20that%20are%20used;%20generally,%20these%20defaults%20are%20sufficient%20for%20many%20cases.%0A%0A%20%20%20%60%20-MR::fast%60%20%20%20%0A%20%20%20%20%20%20%20%20A%20special%20faster%20density%20scoring%20formulation%20is%20used.%20%20Off%20by%20default,%20but%20it%20is%20recommended.%0A%0A%20%20%20%60%20-MR::max_gaplength_to_model%208%60%20%20%0A%20%20%20%20%20%20%20%20Rosetta%20will%20close%20gaps%20up%20to%20this%20width;%20the%20larger%20this%20value%20is,%20the%20more%20sampling%20is%20required.%20%20Values%20higher%20than%2010%20will%20often%20return%20incorrect%20loop%20conformations,%20although%20for%20very%20restrained%20segments,%20or%20largely%20helical%20segments,%20large%20insertions%20may%20be%20successfully%20modeled.%0A%0A%20%20%20%20-nstruct%2020%20%20%20%0A%20%20%20The%20number%20of%20output%20structures.%20%20Generally%2010-20%20is%20sufficient,%20unless%20a%20large%20'max_gaplength_to_model'%20is%20given.%0A%0A%20%20%20%20-ignore_unrecognized_res%20%20%20%0A%20%20%20If%20the%20template%20contains%20nonstandard%20residues/ligands/waters,%20this%20tells%20Rosetta%20to%20ignore%20them.%20%20This%20flag%20is%20recommended.%0A%0A%20%20%20%20-loops::frag_files%20inputs/aa1crb_09_05.200_v1_3.gz%20inputs/aa1crb_03_05.200_v1_3.gz%20none%20%20%20%0A%20%20%20(Optional)%20%09Fragment%20files%20from%20Robetta.%20%20If%20omitted,%20MR-Rosetta%20will%20automatically%20generate%20fragments%20for%20the%20input%20structure;%20this%20may%20slightly%20reduce%20final%20model%20accuracy.%20%20(The%20two%20separate%20command%20lines%20illustrate%20using%20and%20omitting%20this%20flag).%0A%0A-%0A%0ASince%20each%20model%20is%20independently%20generated,%20multiple%20processes%20may%20be%20used%20to%0Aproduce%20all%20the%20necessary%20models.%20%20To%20manage%20the%20output,%20either%20each%20process%0Acan%20be%20run%20from%20a%20separate%20directory,%20or%20'-out:prefix%20<prefix>'%20can%20be%20used%20to%0Akeep%20jobs%20from%20overwriting%20each%20other's%20structures.%20%20Rosetta%20workloads%20may%20also%0Abe%20split%20using%20MPI;%20see%20the%20rosetta%20documentation%20for%20more%20details.%0A%0AFor%20a%20short%20test%20of%20these,%20you%20can%20run%20these%20commands%20for%20steps%204%20and%205,%20respectively,%20using%20provided%20inputs.%20Please%20note%20that%20for%20Step%204%20you%20need%20to%20generate%20fragments%20(by%20submitting%20your%20pdb%20to%20http://robetta.bakerlab.org/).%0A

$> $ROSETTA/bin/mr_protocols.default.linuxclangrelease -in::file::fasta inputs/1crb.fasta -in::file::alignment templates/2qo4.ali -in::file::template_pdb phaser/2qo4_mr.PHASER.1.pdb -loops::frag_files inputs/frags.200.3mers inputs/frags.200.3mers none -edensity:mapreso 3.0 -edensity:grid_spacing 1.5 -edensity:mapfile phaser/2qo4_mr.PHASER.1_2mFo-DFc.ccp4 -MR::max_gaplength_to_model 8 -MR::fast -nstruct 1 -ignore_unrecognized_res -overwrite

%0A%24>%20%24ROSETTA/bin/mr%5C_protocols.default.linuxclangrelease%20-in::file::fasta%20inputs/1crb.fasta%20-in::file::alignment%20templates/2qo4.ali%20-in::file::template%5C_pdb%20phaser/2qo4%5C_mr.PHASER.1.pdb%20-edensity:mapreso%203.0%20-edensity:grid%5C_spacing%201.5%20-edensity:mapfile%20phaser/2qo4%5C_mr.PHASER.1%5C_2mFo-DFc.ccp4%20-MR::max%5C_gaplength%5C_to%5C_model%208%20-MR::fast%20-nstruct%201%20-ignore%5C_unrecognized%5C_res%20-overwrite%0A%60%60%60%0A%0AAlternatively,%20there%20is%20a%20compact%20output%20format,%20'silent%20files'%20(see%20%5B%5BControlling%20Input%20and%20Output%20in%20Rosetta">http://robetta.bakerlab.org/fragmentsubmit.jsp](http://robetta.bakerlab.org/fragmentsubmit.jsp). The fragment files should be built with the full-length sequence; rosetta handles remapping the >>>>>>> cbbf46a34eabb7ee1743531183eb99e185343849 fragments if not all gaps are rebuilt. A brief overview of flags is given below:

  -in::file::fasta inputs/1crb.fasta
  -in::file::alignment inputs/1crb_2qo4.ali
  -in::file::template_pdb inputs/2qo4.PHASER.1.pdb

The fasta, alignment and template PDBs. See section 1 for the input file format if it needs to be hand-edited.

  -edensity:mapfile inputs/sculpt_2QO4_A.PHASER.1.map
  -edensity:mapreso 3.0
  -edensity:grid_spacing 1.5

This is how the density map and scorefunction parameters are given to Rosetta. The input map (-edensity:mapfile) is CCP4 format. The flags 'mapreso' defines the resolution of the calculated density; If the data is high-resolution it is often good to limit this to 2.5 or 3. The grid spacing should be no more than 1/2 the map resolution; if -MR::fast is used (see below), then the grid_spacing flag may be omitted (since more finely sampled grids will have much less of a speed penalty).

  -MR::cen_dens_wt 4.0
  -MR::fa_dens_wt 1.0

This controls the weight on the experimental density data furing the two stages of refinement. The second flag (fa_dens_wt) has the greatest impact on the final models. If, after model generation and visual inspection, models don't seem to be fitting the density well (or overfitting to the density), this may be adusted accordingly. If omitted, the values shown abve are the defaults that are used; generally, these defaults are sufficient for many cases. ` -MR::fast` A special faster density scoring formulation is used. Off by default, but it is recommended. ` -MR::max_gaplength_to_model 8` Rosetta will close gaps up to this width; the larger this value is, the more sampling is required. Values higher than 10 will often return incorrect loop conformations, although for very restrained segments, or largely helical segments, large insertions may be successfully modeled. -nstruct 20 The number of output structures. Generally 10-20 is sufficient, unless a large 'max_gaplength_to_model' is given. -ignore_unrecognized_res If the template contains nonstandard residues/ligands/waters, this tells Rosetta to ignore them. This flag is recommended. -loops::frag_files inputs/aa1crb_09_05.200_v1_3.gz inputs/aa1crb_03_05.200_v1_3.gz none (Optional) Fragment files from Robetta. If omitted, MR-Rosetta will automatically generate fragments for the input structure; this may slightly reduce final model accuracy. (The two separate command lines illustrate using and omitting this flag). - Since each model is independently generated, multiple processes may be used to produce all the necessary models. To manage the output, either each process can be run from a separate directory, or '-out:prefix ' can be used to keep jobs from overwriting each other's structures. Rosetta workloads may also be split using MPI; see the rosetta documentation for more details. For a short test of these, you can run these commands for steps 4 and 5, respectively, using provided inputs. Please note that for Step 4 you need to generate fragments (by submitting your pdb to http://robetta.bakerlab.org/).

$> $ROSETTA/bin/mr_protocols.default.linuxclangrelease -in::file::fasta inputs/1crb.fasta -in::file::alignment templates/2qo4.ali -in::file::template_pdb phaser/2qo4_mr.PHASER.1.pdb -loops::frag_files inputs/frags.200.3mers inputs/frags.200.3mers none -edensity:mapreso 3.0 -edensity:grid_spacing 1.5 -edensity:mapfile phaser/2qo4_mr.PHASER.1_2mFo-DFc.ccp4 -MR::max_gaplength_to_model 8 -MR::fast -nstruct 1 -ignore_unrecognized_res -overwrite

$> $ROSETTA/bin/mr\_protocols.default.linuxclangrelease -in::file::fasta inputs/1crb.fasta -in::file::alignment templates/2qo4.ali -in::file::template\_pdb phaser/2qo4\_mr.PHASER.1.pdb -edensity:mapreso 3.0 -edensity:grid\_spacing 1.5 -edensity:mapfile phaser/2qo4\_mr.PHASER.1\_2mFo-DFc.ccp4 -MR::max\_gaplength\_to\_model 8 -MR::fast -nstruct 1 -ignore\_unrecognized\_res -overwrite ``` Alternatively, there is a compact output format, 'silent files' (see [[Controlling Input and Output in Rosetta) that can be used to dump structures to. Simply add the flags '-out:file:silent -out:file:silent_struct_type binary' and all structures from one process will be written to this compact file. Then the rosetta program 'extract_pdb' can be used to extract:

$ bin/extract_pdbs.default.linuxgccrelease -database $DB -in:file:silent <silent_filename> -silent_struct_type binary

Molecular Replacement Demo

Step 1: prepare_template_for_MR.sh

Steps 2 & 3: run_phaser.sh and make_maps.sh

Steps 4A & 4B: run_rosetta_mr.sh

should be built with the full-length sequence; rosetta handles remapping the