The scripts and input files that accompany this demo can be found in the
demos/public directory of the Rosetta weekly releases.
KEYWORDS: DESIGN ENZYMES
Tutorial for a complete de novo enzyme design run, using the TIM reaction as an example, as published in
Tutorial written at RosettaCon2011 by Florian Richter (floric at uw dot edu), with help from Patrick Conway, Amanda Loshbaugh, Neil King, and Gert Kiss.
The contents of the demo directory should be:
starting_files/ -- directory in which the raw input files are given - these are provided for you and serve as the base for your tutorial rosetta_inputs/ -- directory in which the modified starting files should be placed which will be used as inputs to Rosetta. You may need to make modifications like stripping extra chains from the input PDB; store the modified PDB here and leave the unaltered one in starting_files scripts/ -- python scripts, shell scripts, used in the workflow -- awk, grep, sed, etc. command lines README.dox -- A prose or list description of how to perform the protocol FOR_AUTHORS.txt -- A description for the demo creators of what their demo should achieve. -- Most of what starts in this file should end up in the README file as well.
The purpose of this tutorial is to reenact all steps described in the PLoS ONE paper.
A .params file for all residues used in the theozyme. The Rosetta3 database has .params files for all amino acids, but they will have to be generated for the ligand/reaction substrate. Refer to ligand docking documentation under the link above.
A .cst (constraints) file. This file is written by the user and defines the geometry of the theozyme in Rosetta format (see enzdes cstfile documentation linked above) for use in subsequent steps of the design process.
Defining a theozyme:
A theozyme is not defined using Rosetta, but usually in one of the following ways: 1) through quantum mechanical calculations (see references of Ken Houk's work in the PLoS ONE paper), 2) from chemical intuition, or 3) by stealing or using as inspiration a naturally occurring enzyme’s active site. In this tutorial, we will design a novel triose phosphate isomerase (TIM) based upon a naturally occurring TIM active site. Our theozyme will consist of three catalytic residues and the DHAP ligand from the S. cerevisiae TIM (PDB code 1ney).
Making and checking a .cst file:
Now that we have dreamed up our theozyme, it needs to be expressed in a format that Rosetta can read. In general, the most unambiguous or precisely defined interaction should come first in the .cst file. We use the Rosetta executable CstfileToTheozymePDB.linuxgccrelease to generate .pdb format models from our .cst file so that we can visually check that it defines our theozyme correctly. The command:
$> $ROSETTA3/main/source/bin/CstfileToTheozymePDB.linuxgccrelease -extra_res_fa rosetta_inputs/1n1.params -match:geometric_constraint_file rosetta_inputs/mocktim_first_2interactions_example.cst
produces a file called
PDB_Model_mocktim_first_2interactions_example.cst.pdb in the working
directory which can then be visualized with PyMOL. Inspecting this output
.pdb file will ensure that the theozyme geometry that is given to Rosetta
is as the user intends. See the additional comments on running
CstfileToTheozymePDB in the .cst file documentation linked above.
(i.e. finding suitable sites for the active site theozyme in a library of scaffold proteins)
The .params file from Step 1.
The .cst file from Step 1.
A library of scaffold protein structures in .pdb format. The scaffold library should be as big as possible. Refer to the matcher documentation linked above for instruction on how to prepare a scaffold for matching (which is mainly deciding where the binding site is and what positions will be considered for theozyme residue attachment). In this tutorial, we will match our theozyme into one scaffold, PDB code 1tml, which can be found here: rosetta_inputs/scaffolds/1tml_11.pdb.
A scaffold position file for each scaffold that defines which residues in the scaffold structure will be considered for theozyme residue attachment. The format of scaffold position files, and instructions on how to generate them, can be found in the matcher documentation. In this tutorial, our scaffold position file can be found here: rosetta_inputs/scaffolds/1tml_11.pos.
Optionally, a ligand grid file that defines where in three-dimensional space the ligand should be placed during matching. In this tutorial we will not use a ligand grid file, but in case one wants to make sure that the ligand is confined to a certain region of space, it is recommended that one be used. More information on ligand grid files can be found in the matcher documentation linked above.
A .pdb file for each “match”. Each match file contains the three-dimensional coordinates of both the scaffold protein and the theozyme, including the ligand, and will be used as an input in step
The number of matches found in the scaffold depends on the complexity of the theozyme, and be anywhere between 0 and hundreds.
The command line
$> $ROSETTA3/main/source/bin/match.linuxgccrelease @rosetta_inputs/general_match.flags @rosetta_inputs/1tml_sys.flags
finds a bunch of matches (~11 in this tutorial) and writes them to the working directory. In a real-life enzdes project, one should look at a few of the matches in PyMOL to make sure that they look roughly as envisioned. Alternatively, the matches could be ranked by similarity to the ideal theozyme (watch for Scott Johnson’s / Ken Houk's EDGE publication).
Usually, every match from step 2 is designed several times (between 10-100, depending on computational resources) with the Rosetta enzyme_design executable. For a detailed explanation of what this executable does (and potential options), refer to documentation/paper linked above. Briefly, all residues that are within a given (8A) sphere of the ligand (excepting theozyme residues) are identified and considered changeable in design. These residues are mutated to alanine, and the resulting structure (scaffold with theozyme residues and ligand in a poly-ala cavity) is minimized with respect to the theozyme geometry as specified in the .cst file. Then, 2-3 cycles of constrained sequence design and minimization are carried out, and the final sequence is subjected to an unconstrained repack and minimization. In this tutorial, we will run this stage for one of the matches (rosetta_inputs/UM_1_D41H116K189_1tml_11_mocktim_1.pdb). The command line:
$> $ROSETTA3/main/source/bin/enzyme_design.linuxgccrelease @rosetta_inputs/general_design.flags -s rosetta_inputs/UM_1_D41H116K189_1tml_11_mocktim_1.pdb -out:file:o scorefile.txt
generates a designed protein in .pdb format and a score file that has one line of values for several score terms and other metrics. In a real life example, there would be a .pdb file for every design as well as a score file that contains one line of values for each design. An example of such a score file with information about all designs is rosetta_inputs/mocktim_all_design_scores.out, which was taken from the runs done for the calculations in the PLoS paper.
The purpose of this stage is to reduce the number of candidate designs to an amount tractable for visual inspection, and get rid of designs that have obvious defects. In short, one is looking for designs that have 1) good catalytic geometry, 2) good ligand binding score, 3) a preformed/preorganized active site and 4) a well behaved protein (expressible, soluble, stably folded, etc).
For 1), the constraint energy is taken as a metric, for 2) there is the straight rosetta ligand binding score, for 3), the designed site was repacked without the ligand in the design calculation, and for 4) the designed protein is compared to the scaffold it came from at the end of the design calculation. For each of these 4 criteria, there are terms in the scorefile (refer to documentation linked above). The script DesignSelect.pl, which is part of the Rosetta3 distribution, can read the output scorefile, as well as a file that specifies required values for certain columns, and will then output only those designs in the scorefile that satisfy all required values:
$> $ROSETTA3/src/apps/public/enzdes/DesignSelect.pl -d rosetta_inputs/mocktim_all_design_scores.out -c rosetta_inputs/mocktim_design_selectreqs.txt > selected_designs.txt
These commands will output 44 designs from the 3720 produced for the PLoS ONE paper. These 44 would then be visually examined for whether any of them look promising enough to be expressed and characterized.