|Table of contents|
"Skeleton" XML format
Copy, paste, fill in, and enjoy
<ROSETTASCRIPTS> <SCOREFXNS> </SCOREFXNS> <TASKOPERATIONS> </TASKOPERATIONS> <FILTERS> </FILTERS> <MOVERS> </MOVERS> <APPLY_TO_POSE> </APPLY_TO_POSE> <PROTOCOLS> </PROTOCOLS> </ROSETTASCRIPTS>
Anything outside of the < > notation is ignored and can be used to comment the xml file
General Description and Purpose
RosettaScripts is meant to provide an xml-scriptable interface for conducting all of the tasks that interface design developers produce. With such a scriptable interface, it is hoped, it will be possible for non-programmers to 'mix-and-match' different design strategies and apply them to their own needs. It is also hoped that through a common interface, code-sharing between different people will be smoother. Note that at this point, the only movers and filters that are implemented in this application are the ones described below. More will be made available in future releases. At this point these include protocols from the protein-interface design, protein docking, enzyme-design, ligand-docking and -design, monomer design, and DNA-interface design groups. General movers for loop modeling and structure relaxation are also available.
A paper describing RosettaScripts is available at: Fleishman et al. (2011) PLoS 1 6:e20161
At the most abstract level, all of the computations that are needed in interface design fall into two categories: Movers and Filters. Movers change the conformation of the complex by acting on it, e.g., docking/design/minimization, and filters decide whether a given conformation should go on to the subsequent steps. Filters are meant to reduce the amount of computation that is conducted on conformations that show no promise. Then, a RosettaScript is merely a sequence of movers and filters.
The implementation for this behaviour is done by the following components:
- ParsedProtocol, Filter, and Mover
ParsedProtocol maintains a vector of pairs of movers and their associated filters. By using the TrueFilter or the NullMover, filters and movers can be essentially decoupled by any protocol. The setup of having pairs of movers and filters is used simply because in most contexts filters will be conceptually associated with a mover and vice versa.
- DockDesignParser.cc This function parses an xml file and populates DockDesignMover with pairs of Movers and Filters. All of the movers and filters that are supported should also be defined in this function.
This application is not yet strictly speaking part of RosettaScripts but is strongly related to the design purposes of RS. Work in ongoing to supersede this application with a more useful RS implementation. In the meantime here is an explanation.
The application was described in:
Fleishman et al. Science 332: 816. Here is the relevant excerpt:
For each design that passed the abovementioned filters, the contribution of each amino-acid substitution at the interface is assessed by singly reverting residues to their wildtype identities and testing the effects of the reversion on the computed binding energy. If the difference in binding energy between the designed residue and the reverted one is less than 0.5R.e.u. in favor of the design, then the position is reverted to its wildtype identity. A Rosetta application to compute these values is available in the Rosetta release and is called revert_design_to_native. A report of all residue changes was produced and each suggestion was reviewed manually.
Usage: revert_design_to_native -revert_app:wt <Native protein PDB> -revert_app:design <Designed PDB> -ex1 -ex2 -use_input_sc -database <> > log
Keep the log. At its end you'll find a summary of all mutations attempted and their significance for binding energy.
Example XML file
The following simple example will compute ala-scanning values for each residue in the protein interface:
<ROSETTASCRIPTS> <SCOREFXNS> <interface weights=interface/> </SCOREFXNS> <FILTERS> <AlaScan name=scan partner1=1 partner2=1 scorefxn=interface interface_distance_cutoff=10.0 repeats=5/> <Ddg name=ddg confidence=0/> <Sasa name=sasa confidence=0/> </FILTERS> <MOVERS> <Docking name=dock fullatom=1 local_refine=1 score_high=soft_rep/> </MOVERS> <APPLY_TO_POSE> </APPLY_TO_POSE> <PROTOCOLS> <Add mover_name=dock filter_name=scan/> <Add filter_name=ddg/> <Add filter_name=sasa/> </PROTOCOLS> </ROSETTASCRIPTS>
Rosetta will carry out the order of operations specified in PROTOCOLS, starting with docking (in this case this is all-atom docking using the soft_rep weights). It will then apply alanine scanning, repeated 5 times for better convergence, for every residue on both sides of the interface computing the binding energies using the interface weight set (counting mostly attractive energies). The binding energy (ddg) and surface area (sasa) will also be computed. All of the values will be output in a .report file. Notice that since ddg and sasa are assigned confidence=0, they are not used here as filters that can terminate a trajectory per se, but rather for reporting the values for the complex. An important point is that filters never change the sequence or conformation of the structure, so the ddg and sasa values are reported for the input structure following docking, with the alanine-scanning results ignored.
Additional example xml scripts, including examples for docking, protein interface design, and prepacking a protein complex, amongst others, can be found in the rosetta/rosetta_demos/public/rosetta_scripts/ directory. (Or online at https://svn.rosettacommons.org/trac/browser/trunk/rosetta/rosetta_demos/public/rosetta_scripts for those with svn access.)
The following command line would run the above protocol, given that the protocol file name is ala_scan.xml
bin/rosetta_scripts.linuxgccrelease -s < INPUT PDB FILE NAME > -use_input_sc -nstruct 20 -jd2:ntrials 2 -database ~/minirosetta_database/ -ex1 -ex2 -parser:protocol ala_scan.xml -parser:view
The ntrials flag specifies how many trajectories to start per nstruct. In this case, each of 20 trajectories would make two attempts at outputting a structure. If no ntrials is specified, a default value of 1 is assumed.
The parser:view flag may be used with rosetta executables that have been compiled using the extras=graphics switch in the following way (from the Rosetta root directory):
scons mode=release -j3 bin extras=graphics
When running with -parser:view a graphical viewer will open that shows many of the steps in a trajectory. This is extremely useful for making sure that sampling is following the intended trajecotry.
Input and Output Files
Running a typical protocol requires input of an xml file and a starting pdb file, as in the example commandline above. Alternatively, to run the protocol on many structures, save a simple list of the pdb files to be used and replace the flag -s <INPUT PDB FILE NAME> in the commandline with -l <INPUT LIST FILE NAME>. Some movers and filters require specific input files (for example, a pdb file containing stub residues for hot-spot residue placement for PlaceStub or PlaceSimultaneously movers), and in such cases the required input file/s are described below and are generally called via the xml script.
During a run, if any defined filters are not satisfied then the trajectory will be killed and no output files returned, and Rosetta will continue on to the next ntrial (or if all ntrials have been attempted and failed, Rosetta will continue with any remaining nstructs as defined in the commandline). For a successful run in which all filters are satisfied, the output will include a pdb file and a score.sc file. The pdb file ends with an energy table for all residues and lists the values of any filters in the same order they are used in the xml protocol. The output pdb name is identical to the input pdb file name with a suffix denoting the nstruct number. The score.sc file tabulates the energy terms and filter values for every successful nstruct.
Using an IntelliSense editor to help with generating RosettaScripts
Editing RosettaScripts in emacs
The nXML emacs add-on is compatible with the RosettaScripts.rnc schema (found in src/apps/public/rosetta_scripts/RosettaScripts.rnc).
- Download nXML from http://www.thaiopensource.com/nxml-mode/
- Read the nXML portion of the emacsWiki at http://www.emacswiki.org/cgi-bin/wiki/NxmlMode
- Load the RosettaScripts.rnc file into emacs+nXML
- Load your protocol
- Have fun!
Editing RosettaScripts in VisualStudioThe following instructions were provided by Avner Aharoni:
- Download VB express from http://www.microsoft.com/express/download/
- Save the schema in the following folder C:\Program Files\Microsoft Visual Studio 9.0\Xml\Schemas
- Create empty xml file on disk (a file with the .xml suffix)
- Open it in the Visual Studio Express, go to its properties (view =-> property window F4) and set the RosettaScripts.xsd schema for use.
Options Available in the XML Protocol File
This file lists the Movers, Filters, their defaults, meanings and uses as recognized by RosettaScripts. It is written in an xml format and using many free viewers (e.g., vi) will highlight key xml notations, so long as the file has extension .xml
Whenever an xml statement is shown, the following convention will be used:
<...> to define a branch statement (a statement that has more leaves) <.../> a leaf statement. "" defines input expected from the user with ampersand (&) defining the type that is expected (string, float, etc.) () defines the default value that the parser will use if that is not provided by the protocol.
Occasionally it is desirable to run a series of different runs with slightly different parameters. Instead of creating a number of slightly different XML files, one can use script variables to do the job.
If the -parser:script_vars option is set on the command line, every time a string like "%%variable_name%%", is encountered in the XML file, it is replaced with the corresponding value from the command line.
For example, a line in the XML like
<AlaScan name=scan partner1=1 partner2=1 scorefxn=interface interface_distance_cutoff=%%cutoff%% repeats=%%repeat%%/>
can be turned into
<AlaScan name=scan partner1=1 partner2=1 scorefxn=interface interface_distance_cutoff=10.0 repeats=5/>
with the command line option
-parser:script_vars repeat=5 cutoff=10.0
These values can be changed at will for different runs, for example:
-parser:script_vars repeat=5 cutoff=15.0
-parser:script_vars repeat=2 cutoff=10.0
-parser:script_vars repeat=1 cutoff=9.0
Multiple instances of the "%%var%%" string will all be substituted, as well as in any subroutine XML files. Note that while currently script_vars are implemented as pure macro text substitution, this may change in the future, and any use aside from substituting tag values may not work. Particularly, any use of script variables to change the parsing structure of the XML file itself is explicitly *not* supported, and you have a devious mind for even considering it.
The following are defined internally in the parser, and the protocol can use them without defining them explicitly.
Has an empty apply. Will be used as the default mover in <PROTOCOLS> if no mover_name is specified. Can be explicitly specified, with the name "null".
Returns true. Useful for defining a mover without using a filter.
- score12: The default all-atom scorefunction used by rosetta ab-initio and design
- score_docking: high resolution docking scorefxn (standard+docking_patch)
- score_docking_low: low resolution docking scorefxn (interchain_cen)
- soft_rep: soft_rep_design weights.
- score4L: low resolution scorefunction used for loop remodeling (chainbreak weight on)
- score_empty: all weights = 0.
The SCOREFXN section defines scorefunctions that will be used in Filters and Movers. This can be used to define any of the scores defined in the rosetta_database
<"scorefxn_name" weights=("empty" &string) patch=(&string)> <Reweight scoretype=(&string) weight=(&Real)/> <Set (option name)=(value)/> </"scorefxn_name">
where scorefxn_name will be used in the Movers and Filters sections to use the scorefunction. The name should therefore be unique and not repeat the predefined score names. One or more Reweight tag is optional and allows you to change/add the weight for a given scoretype. The Set tag is optional and allows you to change certain scorefunction options.
One or more option can be specified per Set tag:
- exclude_protein_protein_hack_elec=(&bool) - Don't compute hack_elec energies for protein-protein interactions (equivalent to the -ligand::old_estat command line option for ligand_dock/enzyme_design)
- decompose_bb_hb_into_pair_energies=(&bool) - Store backbone hydrogen bonds in the energy graph on a per-residue basis (this doubles the number of calculations, so is off by default)
Global Scorefunction modifiers
The apply_to_pose section may set up constraints, in which case it becomes necessary to set the weights in all of the scorefunctions that are defined. The default weights for all the scorefunctions are defined globally in the apply_to_pose section, but each scorefunction definition may change this weight. For example, to set the HotspotConstraint (backbone_stub_constraint) value to 6.0
<my_spiffy_score weights="soft_rep_design" patch="dock" hs_hash=6.0/>
The following modifiers are recognized:
hs_hash=(the value set by apply_to_pose for hotspot_hash &float)
To properly score symmetric poses, they must be scored with a symmetric score function. To declare a scorefunction symmetric, simply add the tag:
For example, symmetric score12:
<score12_symm weights="score12_full" symmetric=1/>
TaskOperations are used by movers to tell the "packer" which residues/rotamers to use in reorganizing/mutating sidechains. When used by certain Movers, the TaskOperations control what happens during packing, usually by restriction "masks". TaskOperations can also be used by movers to specify sets of residues to act upon in non-packer contexts.
This is a section that is used to change the input structure. The most likely use for this is to define constraints to a structure that has been read from disk.
Sets constraints on the sequence of the pose that can be based on a sequence alignment or an amino-acid transition matrix.
<profile weight=(0.25 &Real) file_name=(<input file name >.cst &string)/>
sets residue_type type constraints to the pose based on a sequence profile. file_name defaults to the input file name with the suffix changed to ".cst". So, a file called xxxx_yyyy.25.jjj.pdb would imply xxxx_yyyy.cst. To generate sequence-profile constraint files with these defaults use DockScripts/seq_prof/seq_prof_wrapper.sh
SetupHotspotConstraints (formerly hashing_constraints)
<SetupHotspotConstraints stubfile=(stubs.pdb &string) redesign_chain=(2 &integer) cb_force=(0.5 &float) worst_allowed_stub_bonus=(0.0 &float) apply_stub_self_energies=(1 &bool) apply_stub_bump_cutoff=(10.0 &float) pick_best_energy_constraint=(1 &bool) backbone_stub_constraint_weight=(1.0 &Real)> <HotspotFiles> <Add file_name=(&string) nickname=(&string) stub_num=(&integer)/> ... </HotspotFiles> </SetupHotspotConstraints>
- stubfile: a pdb file containing the hot-spot residues
- redesign_chain: which is the host_chain for design. Anything other than chain 2 has not been tested.
- cb_force: the Hooke's law spring constant to use in setting up the harmonic restraints on the Cb atoms.
- worst_allowed_stub_bonus: triage stubs that have energies higher than this cutoff.
- apply_stub_self_energies: evaluate the stub's energy in the context of the pose.
- pick_best_energy_constraint: when more than one restraint is applied to a particular residue, only sum the one that makes the highest contribution.
- backbone_stub_constraint_weight: the weight on the score-term in evaluating the constraint. Notice that this weight can be overridden in the individual scorefxns.
- HotspotFiles: You can specify a set of hotspot files to be read individually. Each one is associated with a nickname for use in the placement movers/filters. You can set to keep in memory only a subset of the read stubs using stub_num. If stubfile in the main branch is not specified, only the stubs in the leaves will be used.
Each mover definition has the following structure
<"mover_name" name="&string" .../>
where "mover_name" belongs to a predefined set of possible movers that the parser recognizes and are listed below, name is a unique identifier for this mover definition and then any number of parameters that the mover needs to be defined.
Each filter definition has the following format:
<"filter_name" name="&string" ... confidence=(1 &Real)/>
where "filter_name" belongs to a predefined set of possible filters that the parser recognizes and are listed below, name is a unique identifier for this mover definition and then any number of parameters that the filter needs to be defined.
If confidence is 1.0, then the filter is evaluated as in predicate logic (T/F). If the value is less than 0.999, then the filter is evaluated as fuzzy, so that it will return True in (1.0 - confidence) fraction of times it is probed. This should be useful for cases in which experimental data are ambiguous or uncertain.
<[name_of_this_ligand_area] chain="&string" cutoff=(float) add_nbr_radius=[true|false] all_atom_mode=[true|false] minimize_ligand=[float] Calpha_restraints=[float] high_res_angstroms=[float] high_res_degrees=[float] tether_ligand=[float] />
LIGAND_AREAS describe parameters specific to each ligand, useful for multiple ligand docking studies. "cutoff" is the distance in angstroms from the ligand an amino-acid's C-beta atom can be and that residue still be part of the interface. "all_atom_mode" can be true or false. If all atom mode is true than if any ligand atom is within "cutoff" of the C-beta atom, that residue becomes part of the interface. If false, only the ligand neighbor atom is used to decide if the protein residue is part of the interface. "add_nbr_radius" increases the cutoff by the size of the ligand neighbor atom's radius specified in the ligand .params file. This size can be adjusted to represent the size of the ligand, without entering all_atom_mode. Thus all_atom_mode should not be used with add_nbr_radius.
Ligand minimization can be turned on by specifying a minimize_ligand value greater than 0. This value represents the size of one standard deviation of ligand torsion angle rotation (in degrees). By setting Calpha_restraints greater than 0, backbone flexibility is enabled. This value represents the size of one standard deviation of Calpha movement, in angstroms.
During high resolution docking, small amounts of ligand translation and rotation are coupled with cycles of rotamer trials or repacking. These values can be controlled by the 'high_res_angstrom' and 'high_res_degrees' values respectively. A tether_ligand value (in angstroms) will constrain the ligand so that multiple cycles of small translations don't add up to a large translation.
<[name_of_this_interface_builder] ligand_areas=(comma separated list of predefined ligand_areas) extension_window=(int)/>
An interface builder describes how to choose residues that will be part of a protein-ligand interface. These residues are chosen for repacking, rotamer trials, and backbone minimization during ligand docking. The initial XML parameter is the name of the interface_builder (for later reference). "ligand_areas" is a comma separated list of strings matching LIGAND_AREAS described previously. Finally 'extension_window' surrounds interface residues with residues labeled as 'near interface'. This is important for backbone minimization, because a residue's backbone can't really move unless it is part of a stretch of residues that are flexible.
<[name_of_this_movemap_builder] sc_interface=(string) bb_interface=(string) minimize_water=[true|false]/>
A movemap builder constructs a movemap. A movemap is a 2xN table of true/false values, where N is the number of residues your protein/ligand complex. The two columns are for backbone and side-chain movements. The MovemapBuilder combines previously constructed backbone and side-chain interfaces (see previous section). Leave out bb_interface if you do not want to minimize the backbone. The minimize_water option is a global option. If you are docking water molecules as separate ligands (multi-ligand docking) these should be described through LIGAND_AREAS and INTERFACE_BUILDERS.