Scripter Documentation

Table of contents

11.1 Special Movers

11.2 General Movers

11.3 Protein Interface Design Movers

11.4 Ligand-centric Movers

"Skeleton" XML format

Copy, paste, fill in, and enjoy

<dock_design>
        <SCOREFXNS>
        </SCOREFXNS>
        <TASKOPERATIONS>
        </TASKOPERATIONS>
        <FILTERS>
        </FILTERS>
        <MOVERS>
        </MOVERS>
        <APPLY_TO_POSE>
        </APPLY_TO_POSE>
        <PROTOCOLS>
        </PROTOCOLS>
</dock_design>

Anything outside of the < > notation is ignored and can be used to comment the xml file

General Description and Purpose

RosettaScripts is meant to provide an xml-scriptable interface for conducting all of the tasks that interface design developers produce. With such a scriptable interface, it is hoped, it will be possible for non-programmers to 'mix-and-match' different design strategies and apply them to their own needs. It is also hoped that through a common interface, code-sharing between different people will be smoother. Note that at this point, the only movers and filters that are implemented in this application are the ones described below. More will be made available in future releases. At this point these include protocols from the protein-interface design, protein docking, enzyme-design, ligand-docking and -design, and DNA-interface design groups. General movers for loop modeling are also available.

At the most abstract level, all of the computations that are needed in interface design fall into two categories: Movers and Filters. Movers change the conformation of the complex by acting on it, e.g., docking/design/minimization, and filters decide whether a given conformation should go on to the subsequent steps. Filters are meant to reduce the amount of computation that is conducted on conformations that show no promise. Then, a RosettaScript is merely a sequence of movers and filters.

The implementation for this behaviour is done by the following components:

Example XML file

The following simple example will compute ala-scanning values for each residue in the protein interface:

<dock_design>
        <SCOREFXNS>
                <interface weights=interface/>
        </SCOREFXNS>
        <FILTERS>
                <AlaScan name=scan partner1=1 partner2=1 scorefxn=interface interface_distance_cutoff=10.0 repeats=5/>
                <Ddg name=ddg confidence=0/>
                <Sasa name=sasa confidence=0/>
        </FILTERS>
        <MOVERS>
                <Docking name=dock fullatom=1 local_refine=1 score_high=soft_rep/>
        </MOVERS>
        <APPLY_TO_POSE>
        </APPLY_TO_POSE>
        <PROTOCOLS>
                <Add mover_name=dock filter_name=scan/>
                <Add filter_name=ddg/>
                <Add filter_name=sasa/>
        </PROTOCOLS>
</dock_design>

Rosetta will carry out the order of operations specified in PROTOCOLS, starting with docking (in this case this is all-atom docking using the soft_rep weights). It will then apply alanine scanning, repeated 5 times for better convergence, for every residue on both sides of the interface computing the binding energies using the interface weight set (counting mostly attractive energies). The binding energy (ddg) and surface area (sasa) will also be computed. All of the values will be output in a .report file. Notice that since ddg and sasa are assigned confidence=0, they are not used here as filters that can terminate a trajectory per se, but rather for reporting the values for the complex. An important point is that filters never change the sequence or conformation of the structure, so the ddg and sasa values are reported for the input structure following docking, with the alanine-scanning results ignored.

Additional example xml scripts, including examples for docking, protein interface design, and prepacking a protein complex, amongst others, can be found at: https://svn.rosettacommons.org/trac/browser/trunk/mini/demo/rosetta_scripts/

Example commandline

The following command line would run the above protocol, given that the protocol file name is ala_scan.xml

bin/rosetta_scripts.linuxgccrelease -s < INPUT PDB FILE NAME > -use_input_sc -nstruct 20 -jd2:ntrials 2 -database ~/minirosetta_database/ 
-ex1 -ex2 -parser:protocol ala_scan.xml -parser:view

The ntrials flag specifies how many trajectories to start per nstruct. In this case, each of 20 trajectories would make two attempts at outputting a structure. If no ntrials is specified, a default value of 1 is assumed.

The parser:view flag may be used with rosetta executables that have been compiled using the extras=graphics switch in the following way (from the Rosetta root directory):

scons mode=release -j3 bin extras=graphics

When running with -parser:view a graphical viewer will open that shows many of the steps in a trajectory. This is extremely useful for making sure that sampling is following the intended trajecotry.

Input and Output Files

Running a typical protocol requires input of an xml file and a starting pdb file, as in the example commandline above. Alternatively, to run the protocol on many structures, save a simple list of the pdb files to be used and replace the flag -s <INPUT PDB FILE NAME> in the commandline with -l <INPUT LIST FILE NAME>. Some movers and filters require specific input files (for example, a pdb file containing stub residues for hot-spot residue placement for PlaceStub or PlaceSimultaneously movers), and in such cases the required input file/s are described below and are generally called via the xml script.

During a run, if any defined filters are not satisfied then the trajectory will be killed and no output files returned, and Rosetta will continue on to the next ntrial (or if all ntrials have been attempted and failed, Rosetta will continue with any remaining nstructs as defined in the commandline). For a successful run in which all filters are satisfied, the output will include a pdb file and a score.sc file. The pdb file ends with an energy table for all residues and lists the values of any filters in the same order they are used in the xml protocol. The output pdb name is identical to the input pdb file name with a suffix denoting the nstruct number. The score.sc file tabulates the energy terms and filter values for every successful nstruct.

Using an IntelliSense editor to help with generating RosettaScripts

An xml-schema was generated for us by Avner Aharoni (Microsoft) using Visual Studio. Using this schema in a compatible editor provides a specific editor for writing RosettaScripts, complete with word completion, grammatical error warnings and help with options. We are currently aware of two editors that are fully compatible with this schema

Editing RosettaScripts in emacs

The nXML emacs add-on is compatible with the RosettaScripts.rnc schema (found in src/apps/public/rosetta_scripts/RosettaScripts.rnc).

  1. Download nXML from http://www.thaiopensource.com/nxml-mode/
  2. Read the nXML portion of the emacsWiki at http://www.emacswiki.org/cgi-bin/wiki/NxmlMode
  3. Load the RosettaScripts.rnc file into emacs+nXML
  4. Load your protocol
  5. Have fun!

Editing RosettaScripts in VisualStudio

MS-Windows users can download Visual Studio Express (free of charge) which provides an xml editor that is compatible with the RosettaScripts.xsd schema (found in src/apps/public/rosetta_scripts/RosettaScripts.xsd). The following instructions were provided by Avner Aharoni:

  1. Download VB express from http://www.microsoft.com/express/download/
  2. Save the schema in the following folder C:\Program Files\Microsoft Visual Studio 9.0\Xml\Schemas
  3. Create empty xml file on disk (a file with the .xml suffix)
  4. Open it in the Visual Studio Express, go to its properties (view =-> property window F4) and set the RosettaScripts.xsd schema for use.

Options Available in the XML Protocol File

General Comments

This file lists the Movers, Filters, their defaults, meanings and uses as recognized by RosettaScripts. It is written in an xml format and using many free viewers (e.g., vi) will highlight key xml notations, so long as the file has extension .xml

Whenever an xml statement is shown, the following convention will be used:

<...> to define a branch statement (a statement that has more leaves)
<.../> a leaf statement.
"" defines input expected from the user with ampersand (&) defining the type that is expected (string, float, etc.)
() defines the default value that the parser will use if that is not provided by the protocol.

Predefined Movers

The following are defined internally in the parser, and the protocol can use them without defining them explicitly.

NullMover

Has an empty apply. Will be used as the default mover in <PROTOCOLS> if no mover_name is specified. Can be explicitly specified, with the name "null".

Predefined Filters

TrueFilter

Returns true. Useful for defining a mover without using a filter.

CompoundStatement filter

This is a special filter that uses previously defined filters to construct a compound logical statement with AND, OR, XOR, NAND and NOR operations. By making compound statements of compound statements, esssentially all logical statements can be defined.

<CompoundStatement name=(&string)>
	<OPERATION filter_name=(true_filter &string)/>
	<....
</CompoundStatement>

where OPERATION is any of the operations defined in CAPS above.Note that the operations are performed in the order that they are defined. No precedence rules are enforced, so that any precedence has to be explicitly written by making compound statements of compound statements.Also note that the first OPERATION is ignored, and the value of the first filter is simply assigned to the filter's results.

Predefined Scorefunctions

SCOREFUNCTIONS

This section defines scorefunctions that will be used in Filters and Movers. This can be used to define any of the scores defined in the rosetta_database

<"scorefxn_name" weights=(standard &string) patch="&string">
    <Reweight scoretype="&string" weight="&Real">
</"scorefxn_name">

where scorefxn_name will be used in the Movers and Filters sections to use the scorefunction. The name should therefore be unique and not repeat the predefined score names. The Reweight tag is optional and allows you to change/add the weight for a given scoretype.

Global Scorefunction modifiers

The apply_to_pose section may set up constraints, in which case it becomes necessary to set the weights in all of the scorefunctions that are defined. The default weights for all the scorefunctions are defined globally in the apply_to_pose section, but each scorefunction definition may change this weight. For example, to multiply the favor_native_residue bonus by 6.0

<my_spiffy_score weights="soft_rep_design" patch="dock" fnr=6.0/>

The following modifiers are recognized:

FavorNativeResidue

fnr=(the value set by apply_to_pose for favor_native_residue &float)

HotspotConstraints modifications

<hs_hash=(the value set by apply_to_pose for hotspot_hash &float)

TASKOPERATIONS

TaskOperations(Parser/RosettaScripts)

APPLY_TO_POSE

This is a section that is used to change the input structure. The most likely use for this is to define constraints to a structure that has been read from disk.

Sequence-profile Constraints

Sets constraints on the sequence of the pose that can be based on a sequence alignment or an amino-acid transition matrix.

<profile weight=(0.25 &Real) file_name=(<input file name >.cst &string)/>

sets residue_type type constraints to the pose based on a sequence profile. file_name defaults to the input file name with the suffix changed to ".cst". So, a file called xxxx_yyyy.25.jjj.pdb would imply xxxx_yyyy.cst. To generate sequence-profile constraint files with these defaults use DockScripts/seq_prof/seq_prof_wrapper.sh


SetupHotspotConstraints (formerly hashing_constraints)

<SetupHotspotConstraints stubfile=(stubs.pdb &string) redesign_chain=(2 &integer) cb_force=(0.5 &float) worst_allowed_stub_bonus=(0.0 &float) apply_stub_self_energies=(1 &bool) apply_stub_bump_cutoff=(10.0 &float) pick_best_energy_constraint=(1 &bool) backbone_stub_constraint_weight=(1.0 &Real)/>

MOVERS

Each mover definition has the following structure

<"mover_name" name="&string" .../>

where "mover_name" belongs to a predefined set of possible movers that the parser recognizes and are listed below, name is a unique identifier for this mover definition and then any number of parameters that the mover needs to be defined.

Special Movers

ParsedProtocol (formerly DockDesign)

This is a special mover that allows making a single compound mover and filter vector (just like protocols).

<ParsedProtocol name=( &string)>
	<Add mover_name=( null &string) filter_name=( true_filter &string)/>
	...
</ParsedProtocol>

IfMover

Implements a simple IF( filter( pose ) ) THEN mover( pose )

<If name=( &string) filter_name=(&string) mover_name=(&string)/>

LoopOver

Allows looping over a mover using either iterations or a filter as a stopping condition (the first turns true). By using DockDesign mover above with loop can be useful, e.g., if making certain moves is expensive and then we want to exhaust other, shorter moves.

<LoopOver name=(&string) mover_name=(&string) filter_name=( false_filter &string) iterations=(10 &Integer) drift=(true &bool)/>

drift: true- the state of the pose at the end of the previous iteration will be the starting state for the next iteration. false- the state of the pose at the start of each iteration will be reset to the state when the mover is first called. Note that "falling off the end" of the iteration will revert to the original input pose, even if drift is set to true.

This mover is somewhat deprecated in favor of the more general GenericMonteCarlo mover.

GenericMonteCarlo

Allows sampling structures by MonteCarlo with a mover. The score evaluation of pose during MC are done by filterOP that can do report_sm(), not only ScoreFunctionOP.
You can choose either formats.

1) scoring by filterOP

<GenericMonteCarlo name=(&string) mover_name=(&string) filter_name=(&string) trials=(10 &integer) sample_type=(low, &string) temperature=(0, &Real) drift=(1 &bool)/>

2) scoring by ScoreFunctionOP

<GenericMonteCarlo name=(&string) mover_name=(&string) scorefxn_name=(&string) trials=(10 &integer) sample_type=(low, &string) temperature=(0, &Real) drift=(1 &bool)/>

sample_type: low- sampling structures having lower scores, sampling structures having higher scores
drift: true- the state of the pose at the end of the previous iteration will be the starting state for the next iteration.
false- the state of the pose at the start of each iteration will be reset to the state when the mover is first called ( Of course, this is not MC ).

Subroutine

Calling another RosettaScript from within a RosettaScript

<Subroutine name=(&string) xml_fname=(&string)/>

This definition in effect generates a Mover that can then be incorporated into the RosettaScripts PROTOCOLS section. This allows a simplification and modularization of RosettaScripts.

Recursions are allowed but will cause havoc.

Placement and Placement-associated Movers & Filters

All movers described in this section are described by their developers as highly experimental. The objective of the placement methods are to help in the task of generating hot-spot based designs of protein binders. The starting point for all of them are a protein target (typically chain A), libraries of hot-spot residues, and a scaffold protein.

A few keywords used throughout the following section have special meaning and are briefly explained here.

Auction

This is a special mover associated with PlaceSimultaneously, below. It carries out the auctioning of residues on the scaffold to hotspot sets without actually designing the scaffold. If pairing is unsuccessful Auction will report failure.

<Auction name=( &string) host_chain=(2 &integer) max_cb_dist=(3.0 &Real) cb_force=(0.5 &Real)>
   <StubSets>
     <Add stubfile=(&string)/>
   </StubSets>
</Auction>

Note that none of the options, except for name, needs to be set up by the user if PlaceSimultaneously is notified of it. If PlaceSimultaneously is notified of this Auction mover, PlaceSimultaneously will set all of these options.

MapHotspot

Map out the residues that might serve as a hotspot region on a target. This requires massive user guidance. Each hot-spot residue should be roughly placed by the user (at least as backbone) against the target. Each hot-spot residue should have a different chain ID. The method iterates over all allowed residue identities and all allowed rotamers for each residue. Tests its filters and for the subset that pass selects the lowest-energy residue by score12. Once the first hot-spot residue is identified it iterates over the next and so on until all hot-spot residues are placed. The output contains one file per residue identity combination.

<MapHotspot name="&string" clash_check=(0 &bool) file_name_prefix=(map_hs &string)>
   <Jumps>
     <Add jump=(&integer) explosion=(0 &integer) filter_name=(true_filter & string) allowed_aas=("ADEFIKLMNQRSTVWY" &string) scorefxn_minimize=(score12 &string) mover_name=(null &string)/>
     ....
   </Jumps>
</MapHotspot>
PlacementMinimization

This is a special mover associated with PlaceSimultaneously, below. It carries out the rigid-body minimization towards all of the stubsets.

<PlacementMinimization name=( &string) minimize_rb=(1 &bool) host_chain=(2 &integer) optimize_foldtree=(0 &bool) cb_force=(0.5 &Real)>
  <StubSets>
    <Add stubfile=(&string)/>
  </StubSets>
</PlacementMinimization>
PlaceOnLoop

Remodels loops using kinematic loop closure, including insertion and deletion of residues. Handles hotspot constraint application through these sequence changes.

<PlaceOnLoop name=( &string) host_chain=(2 &integer) loop_begin=(&integer) loop_end=(&integer) minimize_toward_stub=(1&bool) stubfile=(&string) score_high=(score12 &string) score_low=(score4L&string) closing_attempts=(100&integer) shorten_by=(&comma-delimited list of integers) lengthen_by=(&comma-delimited list of integers)/>

currently only minimize_toward_stub is avaible. closing attempts: how many kinematic-loop closure cycles to use. shorten_by, lengthen_by: by how many residues to change the loop. No change is also added by default.

At each try, a random choice of loop change will be picked and attempted. If the loop cannot close, failure will be reported.


PlaceStub

Hotspot-based sidechain placement. This is the main workhorse of the hot-spot centric method for protein-binder design. A paper describing the method and a benchmark will be published soon. The "stub" (hot-spot residue) is chosen at random from the provided stub set. To minimize towards the stub (during placement), the user can define a series of movers (StubMinimize tag) that can be combined with a weight. The weight determines the strength of the backbone stub constraints that will influence the mover it is paired with. Finally, a series of user-defined design movers (DesignMovers tag) are made and the result is filtered according to final_filter. There are two main ways to use PlaceStub:

  1. PlaceStub (default). Move the stub so that it's on top of the current scaffold position, then move forward to try to recover the original stub position.
  2. PlaceScaffold. Move the scaffold so that it's on top of the stub. You'll keep the wonderful hotspot interactions, but suffer from lever effects on the scaffold side. PlaceScaffold can be used as a replacement for docking by deactivating the "triage_positions" option.
<PlaceStub name=(&string) place_scaffold=(0 &bool) triage_positions=(1 &bool) chain_to_design=(2 &integer) score_threshold=(0.0 &Real) allowed_host_res=(&string) stubfile=(&string) minimize_rb=(0 &bool) after_placement_filter=(true_filter &string) final_filter=(true_filter &string) max_cb_dist=(4.0 &Real) hurry=(1 &bool) add_constraints=(1 &bool) stub_energy_threshold=(1.0 &Real) leave_coord_csts=(0 &bool) post_placement_sdev=(1.0 &Real)>
     <StubMinimize>
        <Add mover_name=(&string) bb_cst_weight=(10, &Real)/>
     </StubMinimize>
     <DesignMovers>
        <Add mover_name=(&string) use_constraints=(1 &bool) coord_cst_std=(0.5 &Real)/>
     </DesignMovers>
     <NotifyMovers>
        <Add mover_name=(&string)/>
     </NotifyMovers>
</PlaceStub>

The available tracers are:

Submovers: Submovers are used to determine what moves are used following stub placement. For example, once a stub has been selected, a StubMinimize mover can try to optimize the current pose towards that stub. A DesignMover can be used to design the pose around that stub. Using DesignMover submovers within PlaceStub (instead of RepackMinimize movers outside PlaceStub) allows one to have a "memory" of which stub has been used. In this way, a DesignMover can fail a filter without causing the trajectory to completely reset. Instead, the outer PlaceStub mover will select another stub, and the trajectory will continue.
There are two types of sub movers that can be called within the mover.

  1. StubMinimize
    Without defining this submover, the protocol will simply perform a rigid body minimization as well as sc minimization of previous placed stubs in order to minimize towards the stub. Otherwise, a series of previously defined movers can be added, such as backrub, that will be applied for the stub minimization step. Before and after the list of stub minimize movers, there will be a rigid body minimization and a sc minimization of previously placed stubs. The bb_cst_weight determines how strong the constraints are that are derived from the stubs.
    • mover_name: a user previously defined design or minimize mover.
    • bb_cst_weight: determines the strength of the constraints derived from the stubs. This value is a weight on the cb_force, so larger values are stronger constraints.

    Valid/sensible StubMinimize movers are:

    • Backrub
    • LoopRemodel
  2. DesignMovers
    Design movers are typically used once the stubs are placed to fill up the remaining interface, since placestub does not actually introduce any further design other than stub placement.
    • mover_name: a user previously defined design or minimize mover.
    • use_constraints: whether we should use coordinate constraints during this design mover
    • coord_cst_std: the std of the coordinate constraint for this mover. The coord constraints are harmonic, and the force constant, k=1/std. The smaller the std, the stronger the constraint

    Valid/sensible DesignMovers are:

    • RepackMinimize
  3. NotifyMovers
  4. Movers placed in this section will be notified not to repack the PlaceStub-placed residues. This is not necessary if placement movers are used in a nested (recursive) fashion, as the placement movers automatically notify movers nested in them of the hot-spot residues. Essentially, you want to make the downstream movers (you list under this section) aware about the placement decisions in this upstream mover. These movers will not be run at in this placestub, but will be subsequently aware of placed residues for subsequent use. Useful for running design moves after placestub is done, e.g., in loops. Put task awareness only in the deepest placestub mover (if PlaceStub is nested), where the final decisions about which residues harbour hot-spot residues is made. </UL>

PlaceSimultaneously

Places hotspot residues simultaneously on a scaffold, rather than iteratively as in PlaceStub. It is faster therefore allowing more backbone sampling, and should be useful in placing more than 2 hotspots.

<PlaceSimultaneously name=(&string) chain_to_design=(2 &Integer) repack_non_ala=(1 &bool) optimize_fold_tree=(1 &bool) after_placement_filter=(true_filter &string) auction=(&string) stub_score_filter=(&string)/>
     <DesignMovers>
        <Add mover_name=(null_mover &string) use_constraints=(1 &bool) coord_cst_std=(0.5 &Real)/>
     </DesignMovers>
     <StubSets explosion=(0 &integer) stub_energy_threshold=(1.0 &Real)  max_cb_dist=(3.0 &Real) cb_force=(0.5 &Real)>
        <Add stubfile=(& string) filter_name=(&string)/>
     </StubSets>
     <StubMinimize min_repeats_before_placement=(0&Integer) min_repeats_after_placement=(1&Integer)>
       <Add mover_name=(null_mover &string) bb_cst_weight=(10.0 &Real)/>
     </StubMinimize>
     <NotifyMovers>
       <Add mover_name=(&string)/>
     </NotifyMovers>
</PlaceSimultaneously>

Most of the options are similar to PlaceStub above. Differences are mentioned below:

rb_stub_minimization, auction and stub_score_filter allow the user to specify the first moves and filtering steps of PlaceSimultaneously before PlaceSimultaneously is called proper. In this way, a configuration can be quickly triaged if it isn't compatible with placement (through Auction's filtering). If the configuration passes these filters and movers then PlaceSimultaneously can be run within loops of docking and placement, until a design is identified that produces reasonable ddg and sasa.

StubScore

This is actually a filter (and should go under FILTERS), but it is tightly associated with the placement movers, so it's placed here. A special filter that is associated with PlaceSimultaneouslyMover. It checks whether in the current configuration the scaffold is 'feeling' any of the hotspot stub constraints. This is useful for quick triaging of hopeless configuration.

<StubScore name=(&string) chain_to_design=(2 &integer) cb_force=(0.5 &Real)>
  <StubSets>
     <Add stubfile=(&string)/>
  </StubSets>
</StubScore>

Note that none of the flags of this filter need to be set if PlaceSimultaneously is notified of it. In that case, PlaceSimultaneously will set this StubScore filter's internal data to match its own.

General Movers

These movers are general and should work in most cases. They are usually not aware of things like interfaces, so may be most appropriate for monomers or basic tasks.

FavorNativeResidue

<FavorNativeResidue bonus=(1.5 &bool)/>

sets residue_type_constraint to the pose and sets the bonus to 1.5.

MinMover

Does minimization over sidechain and/or backbone

<MinMover name="&string" scorefxn=(score12 &string) chi=(&bool) bb=(&bool) jump=(&string) type=(dfpmin_armijo_nonmonotone &string) tolerance=(0.01&Real)/>

Note that defaults are as for the MinMover class! Check MinMover.cc for the default constructor.

TaskAwareMinMover

Performs minimization. Accepts TaskOperations via the task_operations option e.g.

task_operations=(&string,&string,&string)

to configure which positions are minimized. Options

chi=(&bool) and bb=(&bool)
control sidechain or backbone freedom. Defaults to sidechain minimization. Options scorefxn, jump, type, and tolerance are passed to the underlying MinMover

FastRelax

Preforms the fast relax protocol.

<FastRelax name="&string" scorefxn=(score12 &string) repeats=(8 &int) task_operations=(&string, &string, &string >
   <MoveMap>
      <Chain number=(&integer) chi=(&bool) bb=(&bool)/>
      <Jump number=(&integer) setting=(&bool)/>
      <Span begin=(&integer) end=(&integer) chi=(&bool) bb=(&bool)/>
   </MoveMap>

Options include:

The MoveMap is initially set to minimize all degrees of freedom. The movemap lines are read in the order in which they are written in the xml file, and can be used to turn on or off dofs. The movemap is parsed only at apply time, so that the foldtree and the kinematic structure of the pose at the time of activation will be respected.

MakePolyX

Convert pose into poly XXX ( XXX can be any amino acid )

<MakePolyX name="&string" aa="&string" keep_pro=(0 &bool)  keep_gly=(1 &bool) keep_disulfide_cys=(0 &bool) />

Options include:

PackRotamersMover

Repacks sidechains with user-supplied options, including TaskOperations

<PackRotamersMover name="&string" scorefxn=(&string) task_operations=(&string,&string,&string)/>

ConstraintSetMover

Applies a file-defined constraint set to the pose

<ConstraintSetMover name="&string" cst_file=(&string)/>

ProteinInterfaceMS

Multistate design of a protein interface. The target state is the bound (input) complex and the two competitor states are the unbound partners and the unbound, unfolded partners. Uses genetic algorithms to select, mutate and recombine among a population of starting designed sequences. See Havranek & Harbury NSMB 10, 45 for details.

<ProteinInterfaceMS name="&string" generations=(20 &integer) pop_size=(100 &integer) num_packs=(1 &integer) pop_from_ss=(0 &integer) numresults=(1 &integer) fraction_by_recombination=(0.5 &real) mutate_rate=(0.5 &real) boltz_temp=(0.6 &real) anchor_offset=(5.0 &real) checkpoint_prefix=("" &string) gz=(0 &bool) checkpoint_rename=(0 &bool) scorefxn=(score12 &string) unbound=(1 &bool) unfolded=(1&bool) input_is_positive=(1&bool) task_operations=(&comma-delimited list) unbound_for_sequence_profile=(unbound &bool) profile_bump_threshold=(1.0 &Real) compare_to_ground_state=(see below & bool) output_fname_prefix=("" &string)>
   <Positive pdb=(&string) unbound=(0&bool) unfolded=(0&bool)/>
   <Negative pdb=(&string) unbound=(0&bool) unfolded=(0&bool)/>
   .
   .
   .
</ProteinInterfaceMS>

The input file (-s or -l) is considered as either a positive or negative state (depending on option, input_is_positive). If unbound and unfolded is true in the main option line, then the unbound and the unfolded states are added as competitors. Any number of additional positive and negative states can be added. Unbound and unfolded takes a different meaning for these states: if unbound is checked, the complex will be broken apart and the unbound state will be added. If unfolded is checked, then the unbound and unfolded protein will be added.

unbound_for_sequence_profile: use the unbound structure to generate an ala pose and prune out residues that are not allowed would clash in the monomeric structure. Defaults to true, if unbound is used as a competitor state. profile_bump_threshold: what bump threshold to use above. The difference between the computed bump and the bump in the ala pose is compared to this threshold.

compare_to_ground_state: by default, if you add states to the list using the Positive/Negative tags, then the energies of all additional states are zeroed at their 'best-score' values. This allows the user to override this behaviour. See code for details.

output_fname_prefix: All of the positive/negative states that are defined by the user will be output at the end of the run using this prefix. Each state will have its sequence changed according to the end sequence and then a repacking and scoring of all states will take place according to the input taskfactory.

Rules of thumb for parameter choice. The Fitness F is defined as:

 F = Sum_+( exp(E/T) ) / ( Sum_+( exp(E/T) ) + Sum_-( exp(E/T) ) + Sum_+((E+anchor)/T) )

where Sum_-, and Sum_+ is the sum over the negative and positive states, respectively.

the values for F range from 1 (perfect bias towards +state) to 0 (perfect bias towards -state). The return value from the PartitionAggregateFunction::evaluate method is -F, with values ranging from -1 to 0, correspondingly. You can follow the progress of MSD by looking at the reported fitnesses for variants within a population at each generation. If all of the parameters are set properly (temperature etc.) expect to see a wide range of values in generation 1 (-0.99 - 0), which is gradually replaced by higher-fitness variants. At the end of the simulation, the population will have shifted to -1.0 - -0.5 or so.

For rules of thumb, it's useful to consider a two-state, +/- problem, ignoring the anchor (see below, that's tantamount to setting anchor very high) In this case FITNESS simplifies to:

 F = 1/(exp( (dE)/T ) + 1 )

and the derivative is:

 F' = 1/(T*(exp(-dE/T) + exp(dE/T) + 2)

where dE=E_+ - E_-

A good value for T would then be such where F' is sizable (let's say more than 0.05) at the dE values that you want to achieve between the positive and negative state. Since solving F' for T is not straightforward, you can plot F and F' at different temperatures to identify a reasonable value for T, where F'(dE, T) is above a certain threshold. If you're lazy like me, set T=dE/3. So, if you want to achieve differences of at least 4.5 e.u between positive and negative states, use T=1.5.

To make a plot of these functions use MatLab or some webserver, e.g., http://www.walterzorn.com/grapher/grapher_e.htm.

The anchor_offset value is used to set a competitor (negative) state at a certain energy above the best energy of the positive state. This is a computationally cheap assurance that as the specificity changes in favour of the positive state, the stability of the system is not overly compromised. Set anchor_offset to a value that corresponds to the amount of energy that you're willing to forgo in favour of specificity.

SidechainMC

The "off rotamer" sidechain-only Monte Carlo sampler. For a rather large setup cost, individual moves can be made efficiently.

The underlying mover is still under development/benchmarking, so it may or may not work with backbone flexibility or amino acid identity changes.

<SidechainMC name="&string" scorefxn=(&string) ntrials=(&int) temperature=(&real) preserve_detailed_balance=(&bool) prob_uniform=(&real) prob_withinrot=(&real) prob_random_pert_current=(&real)/>

RotamerTrialsMover

This mover goes through each repackable/redesignable position in the pose, taking every permitted rotamer in turn, and evaluating the energy. Each position is then updated to the lowest energy rotamer. It does not consider coordinated changes at multiple residues, and may need several invocations to reach convergence.

In addition to the score function, the mover takes a list of task operations to specify which residues to consider. (See TaskOperations(Parser/RosettaScripts).)

<RotamerTrialsMover name="&string" scorefxn=(&string) task_operations=(&string,&string,&string) show_packer_task=(0 &bool) />

RotamerTrialsMinMover

This mover goes through each repackable/redesignable position in the pose, taking every permitted rotamer in turn, minimizing it in the context of the current pose, and evaluating the energy. Each position is then updated to the lowest energy minimized rotamer. It does not consider coordinated changes at multiple residues, and may need several invocations to reach convergence.

In addition to the score function, the mover takes a list of task operations to specify which residues to consider. (See TaskOperations(Parser/RosettaScripts).)

<RotamerTrialsMinMover name="&string" scorefxn=(&string) task_operations=(&string,&string,&string)/>

Protein Interface Design Movers

These movers are at least somewhat specific to the design of protein-protein interfaces. Attempting to use them with, for example, protein-DNA complexes may result in unexpected behavior.

Computational 'affinity maturation' movers (highly experimental)

These movers are meant to take an existing complex and improve it by subtly changing all relevant degrees of freedom while optimizing the interactions of key sidechains with the target. The basic idea is to carry out iterations of relax and design of the binder, designing a large sphere of residues around the interface (to get second/third shell effects).

We start by generating high affinity residue interactions between the design and the target. The foldtree of the design is cut such that each target residue has a cut N- and C-terminally to it, and jumps are introduced from the target protein to the target residues on the design, and then the system is allowed to relax. This produces deformed designs with high-affinity interactions to the target surface. We then use the coordinates of the target residues to generate harmonic coordinate restraints and send this to a second cycle of relax, this time without deforming the backbone of the design. Example scripts are available in demo/rosetta_scripts/computational_affinity_maturation/

RandomMutation

Introduce a random mutation in a position allowed to redesign to an allowed residue identity. Control the residues and the target identities through task_operations. This can be used in conjunction with GenericMonteCarlo to generate trajectories of affinity maturation.

<RandomMutation name=(&string) task_operations=(&string comma-separated taskoperations) scorefxn=(score12 &string)/>


HotspotDisjointedFoldTree

Creates a disjointed foldtree where each selected residue has cuts N- and C-terminally to it.

<HotspotDisjointedFoldTree name=(&string) ddG_threshold=(1.0 &Real) resnums=("" comma-delimited list of residues &string) scorefxn=(score12 &string) chain=(2 &Integer) radius=(8.0 &Real)/>
AddSidechainConstraintsToHotspots

Adds harmonic constraints to sidechain atoms of target residues (to be used in conjunction with HotspotDisjointedFoldTree). Save the log files as those would be necessary for the next stage in affinity maturation.

<AddSidechainConstraintsToHotspots name=(&string) chain=(2 &Integer) coord_sdev=(1.0 &Real) resnums=(comma-delimited list of residue numbers)/>

  • resnums: the residues for which to add constraints. Notice that this list will be treated in addition to any residues that have cut points on either side.
  • coord_sdev: the standard deviation on the coordinate restraints. The lower the tighter the restraints.

Docking

Does both centroid and full-atom docking

<pre style="white-space:pre-wrap"><Docking name="&string" score_low=(score_docking_low &string) score_high=(score12 &string) fullatom=(0 &bool) local_refine=(0 &bool) movable_jumps=(1 &Integer vector) optimize_fold_tree=(1 &bool) conserve_foldtree=(0 &bool) design=(0 &bool) task_operations=("" comma-separated list)/>

Prepack

Performs something approximating r++ prepacking (but less rigorously without rotamer-trial minimization) by doing sc minimization and repacking. Separates chains based on jump_num, does prepacking, then reforms the complex. If jump_num=0, then it will NOT separate chains at all.

<Prepack name=(&string) scorefxn=(score12 &string) jump_number=(1 &integer) task_operations=(comma-delimited list)/>

RepackMinimize

RepackMinimize does the design/repack and minimization steps using different score functions as defined by the protocol. repack_partner1 (and 2) defines which of the partners to design. If no particular residues are defined, the interface is repacked/designs. If specific residues are defined, then a shell of residues around those target residues are repacked/designed and minimized. repack_non_ala decides whether or not to change positions that are not ala. Useful for designing an ala_pose so that positions that have been changed in previous steps are not redesigned. min_rigid_body minimize rigid body orientation. (as in docking)

<RepackMinimize name="&string" scorefxn_repack=(score12 &string) scorefxn_minimize=(score12 &string) repack_partner1=(1 &bool) repack_partner2=(1 &bool) design_partner1=(0 &bool) design_partner2=(1 &bool) interface_cutoff_distance=(8.0 &Real) repack_non_ala=(1 &bool) minimize_bb=(1 &bool * see below for more details) minimize_rb=(1 &bool) minimize_sc=(1 &bool) optimize_fold_tree=(1 & bool) task_operations=("" &string)>
    <residue pdb_num/res_num, see below/>
</RepackMinimize>

If no repack_partner1/2 options are set, you can specify repack=0/1 to control both. Similarly with design_partner1/2 and design=0/1

DesignMinimizeHBonds

Same as for RepackMinimize with the addition that a list of target residues to be hbonded can be defined. Within a sphere of 'interface_cutoff_distance' of the target residues,the residues will be set to be designed.The residues that are allowed for design are restricted to hbonding residues according to whether donors (STRKWYQN) or acceptors (EDQNSTY) or both are defined. If residues have been designed that do not, after design, form hbonds to the target residues with energies lower than the hbond_energy, then those are turned to Ala.

<DesignMinimizeHbonds name=(design_minimize_hbonds &string) hbond_weight=(3.0 &float) scorefxn_design=(score12 &string) scorefxn_minimize=score12) donors="design donors? &bool" acceptors="design acceptors? &bool" bb_hbond=(0 &bool) sc_hbond=(1 &bool) hbond_energy=(-0.5 &float) interface_cutoff_distance=(8.0 &float) repack_partner1=(1 &bool) repack_partner2=(1 &bool) design_partner1=(0 &bool) design_partner2=(1 &bool) repack_non_ala=(1 &bool) min_rigid_body=(1 &bool) task_operations=("" &string)>
        <residue pdb_num="pdb residue and chain, e.g., 31B &string"/>
        <residue res_num="serially defined residue number, e.g., 212 &integer"/>
</DesignMinimizeHbonds>

build_Ala_pose

Turns either or both sides of an interface to Alanines (except for prolines and glycines that are left as in input) in a sphere of 'interface_distance_cutoff' around the interface. Useful as a step before design steps that try to optimize a particular part of the interface. The alanines are less likely to 'get in the way' of really good rotamers.

<build_Ala_pose name=(ala_pose &string) partner1=(0 &bool) partner2=(1 &bool) interface_distance_cutoff=(8.0 &float) task_operations=("" &string)/>

SaveAndRetrieveSidechains

To be used after an ala pose was built (and the design moves are done) to retrieve the sidechains from the input pose that were set to Ala by build_Ala_pose. OR, to be used inside mini to recover sidechains after switching residue typesets. By default, sidechains that are different than Ala will not be changed, unless allsc is true. Please note that naming your mover "SARS" is almost certainly bad luck and strongly discouraged.

<SaveAndRetrieveSidechains name=(save_and_retrieve_sidechains &string) allsc=(0 &bool) task_operations=("" &string)/>

AtomTree

Sets up an atom tree for use with subsequent movers. Connects pdb_num on host_chain to the nearest residue on the neighboring chain. Connection is made through connect_to on host_chain pdb_num residue

<AtomTree name=(&string) docking_ft=(0 &bool) pdb_num/res_num=(see above) connect_to=(see below for defaults &string) anchor_res=(pdb numbering) connect_from=(see below) host_chain=(2 &integer)/>

SpinMover

Allows random spin around an axis that is defined by the jump. Works preferentially good in combination with a loopOver or best a GenericMonteCarlo and other movers together. Use SetAtomTree to define the jump atoms.

<SpinMover name=(&string) jump=(1 &integer)/>

TryRotamers

Produces a set of rotamers from a given residue. Use after AtomTree to generate inverse rotamers of a given residue.

<TryRotamers name=(&string) pdb_num/res_num=(see above) automatic_connection=(1 &bool) jump_num=(1, &Integer) scorefxn=(score12 &string) explosion=(0 &integer) shove=(&comma-separated residue identities)/>

Backrub

Do backrub-style backbone and sidechain sampling.

<Backrub name=(backrub &string) partner1=(0 &bool) partner2=(1 &bool) interface_distance_cutoff=(8.0 &Real) moves=(1000 &integer) sc_move_probability=(0.25 &float) scorefxn=(score12 &string) small_move_probability=(0.0 &float) bbg_move_probability=(0.25 &float) temperature=(0.6 &float) task_operations=("" &string)>
        <residue pdb_num="pdb residue and chain, e.g., 31B &string"/>
        <residue res_num="serially defined residue number, e.g., 212 &integer"/>
        <span begin="pdb or rosetta-indexed number, eg 10 or 12B &string" end="pdb or rosetta-indexed number, e.g., 20 or 30B &string"/>
</Backrub>

With the values defined above, backrub will only happen on residues 31B, serial 212, and the serial span 10-20. If no residues and spans are defined then all of the interface residues on the defined partner will be backrubbed by default. Note that setting partner1=1 makes all of partner1 flexible. Adding segments has the effect of adding these spans to the default interface definition Temperature controls the monte-carlo accept temperature. A setting of 0.1 allows only very small moves, where as 0.6 (the default) allows more exploration. Note that small moves and bbg_moves introduce motions that, unlike backrub, are not confined to the region that is being manipulated and can cause downstream structural elements to move as well. This might cause large lever motions if the epitope that is being manipulated is a hinge. To prevent lever effects, all residues in a chain that is allowed to backrub will be subject to small moves. Set small_move_probability=0 and bbg_move_probability=0 to eliminate such motions.

bbg_moves are backbone-Gaussian moves. See The J. Chem. Phys., Vol. 114, pp. 8154-8158.

BestHotspotCst

Removes Hotspot BackboneStub constraints from all but the best_n residues, then reapplies constraints to only those best_n residues with the given cb_force constant. Useful to prune down a hotspot-derived constraint set to avoid getting multiple residues getting frustrated during minimization.

<BestHotspotCst name=(&string) chain_to_design=(2 &integer) best_n=(3 &integer) cb_force=(1.0 &Real)/>

DumpPdb

dumps a pdb. Recommended ONLY for debuggging as you can't change the name of the file during a run.

<DumpPdb name=(&string) fname=(dump.pdb &string)/>

DomainAssembly (Not tested thoroughly)

Do domain-assembly sampling by fragment insertion in a linker region. frag3 and frag9 specify the fragment-file names for 9-mer and 3-mer fragments.

<DomainAssembly name=(&string) linker_start_(pdb_num/res_num, see above) linker_end_(pdb_num/res_num, see above) frag3=(&string) frag9=(&string)/>

LoopFinder

Finds loops in the current pose and loads them into the DataMap for use by subsequent movers (eg - LoopRemodel)

<LoopFinder name="&string" interface=(1 &Size) ch1=(0 &bool) ch2=(1 &bool) min_length=(3 &Integer)
 max_length=(1000 &Integer) iface_cutoff=(8.0 &Real) resnum/pdb_num=(see above) 
CA_CA_distance=(15.0 &Real) mingap=(1 &Size)/>

LoopRemodel

Perturbs and/or refines a set of user-defined loops. Useful to sample a variety of loop conformations.

<LoopRemodel name="&string" auto_loops=(0 &bool) loop_start_(pdb_num/res_num, see above) loop_end_(pdb_num/res_num, see above) hurry=(0 &bool) cycles=(10 &Size) protocol=(ccd &string) perturb_score=(score4L &string) refine_score=(score12 &string) perturb=(0 &bool) refine=(1 &bool) design=(0 &bool)/>


DisulfideMover

Introduces a disulfide bond into the interface. The best-scoring position for the disulfide bond is selected from among the residues listed in targets. This could be quite time-consuming, so specifying a small number of residues in targets is suggested.

If no targets are specified on either interface partner, all residues on that partner are considered when searching for a disulfide. Thus including only a single residue for targets results in a disulfide from that residue to the best position across the interface from it, and omitting the targets param altogether finds the best disulfide over the whole interface.

Disulfide bonds created by this mover, if any, are guaranteed to pass a DisulfideFilter.

<DisulfideMover name="&string" targets=(&string)/>

ConstraintSetMover

Adds constraints to the pose using the constraints' read-from-file functionality.

<ConstraintSetMover name=(&string) cst_file=(&string)/>

cst_file: the file containing the constraint data. e.g.,:

...
CoordinateConstraint CA 1 CA 380   27.514  34.934  50.283 HARMONIC 0 1
CoordinateConstraint CA 1 CA 381   24.211  36.849  50.154 HARMONIC 0 1
...

MutateResidue

Change a single residue to a different type. For instance, mutate Arg31 to an Asp.

<MutateResidue name=(&string) target=(&string) new_res=(&string) />

InterfaceRecapitulation

Test a design mover for its recapitulation of the native sequence. Similar to SequenceRecovery filter below, except that this mover encompasses a design mover more specifically.

<InterfaceRecapitulation name=(&string) mover_name=(&string)/>

The specified mover needs to be derived from either DesignRepackMover or PackRotamersMover base class and to to have the packer task show which residues have been designed. The mover then computes how many residues were allowed to be designed and the number of residues that have changed and produces the sequence recapitulation rate. The pose at parse-time is used for the comparison.

VLB (aka Variable Length Build)

Under development! All kudos to Andrew Ban of the Schief lab for making the Insert, delete, and rebuild segments of variable length. This mover will ONLY work with non-overlapping segments!

IMPORTANT NOTE!!!!: VLB uses its own internal tracking of ntrials! This allows VLB to cache fragments between ntrials, saving a very significant amount of time. But each ntrial trajectory will also get ntrials extra internal VLB apply calls. For example, "-jd2:ntrials 5" will cause a maximum of 25 VLB runs (5 for each ntrial). Success of a VLB move will break out of this internal loop, allowing the trajectory to proceed as normal.

<VLB name=(&string) scorefxn=(string)>
    <VLB TYPES GO HERE/>
</VLB>
Default scorefxn is score4L. If you use another scorefxn, make sure the chainbreak weight is > 0. Do not use a full atom scorefxn with VLB!

There are several move types available to VLB, each with its own options. The most popular movers will probably be SegmentRebuild and SegmentInsert.

<SegmentInsert left=(&integer) right=(&integer) ss=(&string) aa=(&string) pdb=(&string) side=(&string) keep_bb_torsions=(&bool)/> 

Insert a pdb into an existing pose. To perform a pure insertion without replacing any residues within a region, use an interval with a zero as the left endpoint.
e.g. [0, insert_after_this_residue].
If inserting before the first residue the Pose then interval = [0,0].  If inserting after the last residue of the Pose then interval = [0, last_residue]. 

*ss = secondary structure specifying the flanking regions, with a character '^' specifying where the insert is to be placed. Default is L^L.
*aa = amino acids specifying the flanking regions, with a character '^' specifying insert.
*keep_bb_torsions = attempt to keep the a few torsions from around the insert. This should be false for pure insertions. (default false)
*side = specifies insertion on its N-side ("N"), C-side ("C") or decide randomly between the two (default "RANDOM"). Random is only random on parsing, not per ntrial

<SegmentRebuild left=(&integer) right=(&integer) ss=(&string) aa=(&string)/> 
Instruction to rebuild a segment. Can also be used to insert a segment, by specifying secondary structure longer than the original segment.
Very touchy. Watch out.
<SegmentSwap left=(&integer) right=(&integer) pdb=(&string)/> instruction to swap a segment with an external pdb
<Bridge left=(&integer) right=(&integer) ss=(&string) aa=(&string)/> connect two contiguous but disjoint sections of a
                       Pose into one continuous section
<ConnectRight left=(&integer) right=(&integer) pdb=(&string)/> instruction to connect one PDB onto the right side of another
<GrowLeft pos=(&integer) ss=(&string) aa=(&string)/> Use this for n-side insertions, but typically not n-terminal
			extensions unless necessary.  It does not automatically cover the
			additional residue on the right endpoint that needs to move during
			n-terminal extensions due to invalid phi torsion.  For that case,
			use the SegmentRebuild class replacing the n-terminal residue with
			desired length+1.
<GrowRight pos=(&integer) ss=(&string) aa=(&string)/> instruction to create a c-side extension

For more information, see the various BuildInstructions in src/protocols/forge/build/

EnzRepackMinimize

EnzRepackMinimize, similar in spirit to RepackMinimize mover, does the design/repack followed by minimization of a protein-ligand (or TS model) interface with enzyme design style constraints (if present, see AddOrRemoveMatchCsts mover) using specified score functions and minimization dofs. Only design/repack or minimization can be done by setting appropriate tags. A shell of residues around the ligand are repacked/designed and/or minimized. If constrained optimization or cst_opt is specified, ligand neighbors are converted to Ala, minimization performed, and original neighbor sidechains are placed back.

<EnzRepackMinimize name="&string" scorefxn_repack=(score12 &string) scorefxn_minimize=(score12 &string) cst_opt=(0 &bool) repack_only=(0 &bool) design=(0 &bool) constraints=(1 &bool) fix_catalytic=(0 &bool) minimize_rb=(1 &bool) minimize_bb=(0 &bool) minimize_sc=(1 &bool) minimize_lig=(0 & bool) min_in_stages=(0 &bool) backrub=(0 &bool) cycles=(1 &integer)/>

AddOrRemoveMatchCsts

Add or remove enzyme-design style pairwise (residue-residue) geometric constraints to/from the pose. A cstfile specifies these geometric constraints, which can be supplied in the flags file (-enzdes:cstfile) or in the mover tag (see below).

The "-run:preserve_header" option should be supplied on the command line to allow the parser to read constraint specifications in the pdb's REMARK lines. (The "-enzdes:parser_read_cloud_pdb" also needs to be specified for the parser to read the matcher's CloudPDB default output format.)

<AddOrRemoveMatchCsts name="&string" cst_instruction=( "void", "&string") cstfile="&string" keep_covalent=(0 &bool) accept_blocks_missing_header=(0 &bool) fail_on_constraints_missing=(1 &bool)/>

Ligand-centric Movers

Movers for ligand docking

These movers replace the executable for ligand docking and provide greater flexibility to the user in customizing the docking protocol. An example XML file for ligand docking is found here (link forthcoming). The movers below are listed in the order found in the old executable.

StartFrom
<StartFrom name="&string" chain="&string"/>
   <Coordinates x=(&float) y=(&float) z=(&float)/>
</StartFrom>

Provide a list of XYZ coordinates. One starting coordinate will be chosen at random and the specified chain will be recentered at this location.

Translate
<Translate name="&string" chain="&string" distribution=[uniform|gaussian] angstroms=(&float) cycles=(&int)/>

The Translate mover is for performing a course random movement of a small molecule in xyz-space. This movement can be anywhere within a sphere of radius specified by "angstroms". The chain to move should match that found in the PDB file (a 1-letter code). "cycles" specifies the number of attempts to make such a movement without landing on top of another molecule. The first random move that does not produce a positive repulsive score is accepted. The random move can be chosen from a uniform or gaussian distribution. This mover uses an attractive-repulsive grid for lightning fast score lookup.

Rotate
<Rotate name="&string" chain="&string" distribution=[uniform|gaussian] degrees=(&int) cycles=(&int)/>

The Rotate mover is for performing a course random rotation throughout all rotational degrees of freedom. Usually 360 is chosen for "degrees" and 1000 is chosen for "cycles". Rotate accumulates poses that pass an attractive and repulsive filter, and are different enough from each other (based on an RMSD filter). From this collection of diverse poses, 1 pose is chosen at random. "cycles" represents the maximum # of attempts to find diverse poses with acceptable attractive and repulsive scores. If a sufficient # of poses are accumulated early on, less rotations then specified by "cycles" will occur. This mover uses an attractive-repulsive grid for lightning fast score lookup.

SlideTogether
<SlideTogether name="&string" chain="&string"/>

The initial translation and rotation may move the ligand to a spot too far away from the protein for docking. Thus, after an initial low resolution translation and rotation of the ligand it is necessary to move the small molecule and protein into close proximity. If this is not done then high resolution docking will be useless. Simply specify which chain to move. This mover then moves the small molecule toward the protein 2 angstroms at a time until the two clash (evidenced by repulsive score). It then backs up the small molecule. This is repeated with decreasing step sizes, 1A, 0.5A, 0.25A, 0.125A.

HighResDocker
<HighResDocker name="&string" repack_every_Nth=(&int) scorefxn="string" movemap_builder="&string" />

The high res docker performs cycles of rotamer trials or repacking, coupled with small perturbations of the ligand(s). The "movemap_builder" describes which side-chain and backbone degrees of freedom exist. The Monte Carlo mover is used to decide whether to accept the result of each cycle. Ligand and backbone flexibility as well as which ligands to dock are described by LIGAND_AREAS provided to INTERFACE_BUILDERS, which are used to build the movemap according the the XML option.

FinalMinimizer
<FinalMinimizer name="&string" scorefxn="&string" movemap_builder=&string/>

Do a gradient based minimization of the final docked pose. The "movemap_builder" makes a movemap that will describe which side-chain and backbone degrees of freedom exist.

InterfaceScoreCalculator
<InterfaceScoreCalculator name=(string) chains=(comma separated chars) scorefxn=(string) native=(string)/>

InterfaceScoreCalculator calculates a myriad of ligand specific scores and appends them to the output file. After scoring the complex the ligand is moved 1000 Å away from the protein. The model is then scored again. An interface score is calculated for each score term by subtracting separated energy from complex energy. If a native structure is specified, 4 additional score terms are calculated:

  1. ligand_centroid_travel. The distance between the native ligand and the ligand in our docked model.
  2. ligand_radious_of_gyration. An outstretched conformation would have a high radius of gyration. Ligands tend to bind in outstretched conformations.
  3. ligand_rms_no_super. RMSD between the native ligand and the docked ligand.
  4. ligand_rms_with_super. RMSD between the native ligand and the docked ligand after aligning the two in XYZ space. This is useful for evaluating how much ligand flexibility was sampled.

Movers for ligand design

These movers work in conjunction with ligand docking movers. An example XML file for ligand design is found here (link forthcoming). These movers presuppose the user has created or acquired a fragment library. Fragments have incomplete connections as specified in their params files. Combinatorial chemistry is the degenerate case in which a core fragment has several connection points and all library fragments have only one connection point.

GrowLigand
<GrowLigand name="&string" chain="&string"/>

Randomly connects a fragment from the library to the growing ligand. The connection point for connector atom1 must specify that it connects to atoms of connector atom2's type, and visa versa.

AddHydrogens
<AddHydrogens name="&string" chain="&string"/>

Saturates the incomplete connections with H. Currently the length of these created H-bonds is incorrect. H-bonds will be the same length as the length of a bond between connector atoms 1 and 2 should be.

FILTERS

Each filter definition has the following format:

<"filter_name" name="&string" ... confidence=(1 &Real)/>

where "filter_name" belongs to a predefined set of possible filters that the parser recognizes and are listed below, name is a unique identifier for this mover definition and then any number of parameters that the filter needs to be defined.

If confidence is 1.0, then the filter is evaluated as in predicate logic (T/F). If the value is less than 0.999, then the filter is evaluated as fuzzy, so that it will return True in (1.0 - confidence) fraction of times it is probed. This should be useful for cases in which experimental data are ambiguous or uncertain.

General Filters

AtomicContact

Do two residues have any pair of atoms within a cutoff distance? Somewhat more subtle than ResidueDistance (which works by neighbour atoms). Iterates over all atom types of a residue, according to the user specified restrictions (sidechain, backbone, protons)

<AtomicContact name=(&string) residue1=(&integer) residue2=(&integer) sidechain=1 backbone=0 protons=0 distance=(4.0 &integer)/>

Some movers (e.g., PlaceSimultaneously) can set a filter's internal residue on-the-fly during protocol operation. To get this behaviour, do not specify residue2.

ContingentFilter

A special filter that allows movers to set its value (pass/fail). This value can then be used in the protocol together with IfMover to control the flow of execution depending on the success of the mover. Currently, none of the movers uses this filter.

<ContingentFilter name=(&string)/>

DesignableResidues

Reports to tracers which residues are repackable/designable according to use-defined task_operations. Useful for automatic interface detection (use the ProteinInterfaceDesign task operation for that). The residue number that are reported are pdb numbering.

<DesignableResidues name=(&string) task_operations=(comma-separated list) designable=(1 &bool) packable=(0 &bool)/>

InterfaceHoles

Looks for voids at protein/protein interfaces using Will Sheffler's packstat. The number reported is the difference in the holes score between bound/unbound conformations. Be sure to set the -holes:dalphaball option!

<InterfaceHoles name=(&string) jump=(1 &integer) threshold=(200 &integer)/>

Rmsd

Calculates the Calpha RMSD over a user-specified set of residues. Superimposition is optional. Selections are additive, so choosing a chain, and individual residue, and span will result in RMSD calculation over all residues selected. If no residues are selected, the filter uses all residues in the pose. Use -in:file:native <filename> to choose an alternate reference pose.

<Rmsd name=(&string) chains=("" &string) threshold=(5 &integer) superimpose=(1 &bool)>
    <residue res/pdb_num=(see above) />
    <span begin_(res/pdb_num)=("" &integer) end_(res/pdb_num)=(""&integer)/>
</Rmsd>

SequenceRecovery

Calculates the fraction sequence recovery of a pose compared to a reference pose. This is similar to InterfaceRecapitulation mover above, but does not require a design mover. Instead, the user can provide a list of task operations that describe which residues are designable in the pose.

<SequenceRecovery name=(&string) rate_threshold=(0.0 &Real) task_operations=(comma-delimited list of task_operations) />

The reference pose against which the recovery rate will be computed can be defined using the -in:file:native command-line flag. If that flag is not defined, the starting pose will be used as a reference.

TerminusDistance

True if all residues in the interface are more than <distance> residues from the N or C terminus. If fails, reports how far failing residue was from the terminus. If passes, returns "1000"

<TerminusDistance name=(&string) jump_number=(1 &integer) distance=(5 &integer)/>


Ddg

Computes the binding energy for the complex and if it is below the threshold returns true. o/w false. Useful for identifying complexes that have poor binding energy and killing their trajectory.

<Ddg name=(ddg &string) scorefxn=(score12 &string) threshold=(-15 &float) jump=(1 &Integer) repeats=(1 &Integer) repack=(true &bool)/>

ResInInterface

Computes the number of residues in the interface specific by jump_number and if it is above threshold returns true. o/w false. Useful as a quick and ugly filter after docking for making sure that the partners make contact.

<ResInInterface name=(riif &string) residues=(20 &integer) jump_number=(1 &integer)/>


HbondsToResidue

This filter checks whether residues defined by res_num/pdb_num are hbonded with as many hbonds as defined by partners, where each hbond needs to have at most energy_cutoff energy.

<HbondsToResidue name=(hbonds_filter &string) partners="how many hbonding partners are expected &integer" energy_cutoff=(-0.5 &float) backbone=(0 &bool) sidechain=(1 &bool) res_num/pdb_num=(&string - see above)>

RotamerBoltzmannWeight

Approximates the Boltzmann probability for the occurrence of a rotamer. Residues to be tested are defined using a task_factory (set all inert residues to no repack). A first-pass alanine scan looks at which residues contribute substantially to binding affinity. Then, the rotamer set for each of these residues is taken, each rotamer is imposed on the pose, the surrounding shell is repacked and minimized and the energy is summed to produce a Boltzmann probability. Can be computed in both the bound and unbound state.

This is apparently a good discriminator between designs and natives, with many designs showing high probabilities for their highly contributing rotamers in both the bound and unbound states.

The filter also reports a modified value for the complex ddG. It computes the starting ddG and then reduces from this energy a fraction of the interaction energy of each residue the rotamer probability of which is below a certain threshold. The interaction energy is computed only for the residue under study and its contacts with residues on another chain.

<RotamerBoltzmannWeight name=(&string) task_operations=(comma-delimited list) radius=(6.0 &Real) jump=(1 &Integer) unbound=(1 &bool) ddG_threshold=(1.5 &Real) scorefxn=(score12 &string) temperature=(0.8 &Real) energy_reduction_factor=(0.5 &Real) repack=(1&bool) skip_ala_scan=(0 &bool)>
   <??? threshold_probability=(&Real)/>
   .
   .
   .
</RotamerBoltzmannWeight>

Sasa

Computes the interface sasa and if it's **higher** than threshold passes.

<Sasa name=(sasa_filter &string) threshold=(800 &float) hydrophobic=(0&bool) polar=(0&bool) jump=(1 &integer)/>

hydrophobic/polar are computed by discriminating each atom into polar (acceptor/donor or polar hydrogen) or hydrophobic (all else) and summing the delta SASA over each category. Notice that at this point only total sasa can be computed across jumps other than 1. Trying to compute hydrophobic or polar sasa across any other jump will cause an exit during parsing.

NeighborType

Filter for poses that place a neighbour of the types specified around a target residue in the partner protein.

<NeighborType name=(neighbor_filter &string) "res_num/pdb_num see above" distance=(8.0 &Real)>
        <Neighbor type=(&3-letter aa code)/>
</NeighborType>


ResidueBurial

How many residues are within an interaction distance of target_residue across the interface. When used with neighbors=1 this degenerates to just checking whether or not a residue is at the interface.

<ResidueBurial name=(&string) "res_num/pdb_num see above" distance=(8.0 &Real) neighbors=(1 &Integer)/>


BuriedUnsatHbonds

Maximum number of buried unsatisfied H-bonds allowed. If a jump number is specified (default=1), then this number is calculated across the interface of that jump. If jump_num=0, then the filter is calculated for a monomer. Note that #unsat for monomers is often much higher than 20. Notice that water is not assumed in these calculations.

<BuriedUnsatHbonds name=(&string) jump_number=(1 &Size) cutoff=(20 &Size)/>


ResidueDistance

What is the distance between two residues? Based on each residue's neighbor atom (usually Cbeta)

<ResidueDistance name=(&string) res1_"res_num/pdb_num see above" res2_"resnum/pdb_num" distance=(8.0 &Real)/>


EnergyPerResidue

Tests the energy of a particular residue. If whole_interface is set to 1, it computes all the energies for the interface residues defined by the jump_number and the interface_distance_cutoff. Helpful for post-design analyses.

<EnergyPerResidue name=(energy_per_res_filter &string) scorefxn=(score12 &string) 
score_type=(total_score &string) pdb_num/res_num(see above) energy_cutoff=(0.0 &float)
whole_interface=(0 &bool) jump_number=(1 &int) interface_distance_cutoff=(8.0 &float)/>


ScoreType

Computes the energy of a particular score type for the entire pose and if that energy is lower than threshold, returns true.

<ScoreType name=(score_type_filter &string) scorefxn=(score12 &string) score_type=(&string) threshold=(&float)/>


AlaScan

Substitutes Ala for each interface position separately and measures the difference in ddg compared to the starting structure. The filter always returns true. The output is only placed in the .report file. Repeats causes multiple ddg calculations to be averaged, giving better converged values.

<AlaScan name=(&string) scorefxn=(score12 &string) jump=(1 &Integer) interface_distance_cutoff=(8.0 &Real) partner1=(0 &bool) partner2=(1 &bool) repeats=(1 &Integer) repack=(1 &bool)/>

DisulfideFilter

Require a disulfide bond between the interfaces to be possible. 'Possible' is taken fairly loosely; a reasonable centroid disulfide score is required (fairly close CB atoms without too much angle strain).

Residues from targets are considered when searching for a disulfide bond. As for DisulfideMover, if no residues are specified from one interface partner all residues on that partner will be considered.

<DisulfideFilter name="&string" targets=(&string)/>


LigDSasa

Computes the fractional interface delta_sasa for a ligand on a ligand-protein interface and checks to see if it is *between* the lower and upper threshold. A DSasa of 1 means ligand is totally buried (loses all it's accessible surface area), 0 means totally accessible (loses none upon interface formation).

<LigDSasa name=(&string) lower_threshold=(0.0 &float) upper_threshold=(1.0 &float)/>


DiffAtomBurial

Compares the DSasa of two specified atoms and checks to see if one is greater or less than other. This is useful for figuring out whether a ligand is oritented in the correct way (i.e. whether in the designed interface one atom is more/less exposed than another)

<DiffAtomBurial name=(&string)  res1_res_num/res1_pdb_num=(0, see res_num/pdb_num convention) res2_res_num/res2_pdb_num=(0, see convention) atomname1=(&string) atomname2=(&string) sample_type=(&string)/>


LigInterfaceEnergy

Calculates interface energy across a ligand-protein interface taking into account (or not) enzdes style cst_energy.

<LigInterfaceEnergy name=(&string)  scorefxn=(&string) include_cstE=(0 &bool) jump_number=(last_jump &integer) energy_cutoff=(0.0 &float)/>

include_cstE=1 will *not* subtract out the cst energy from interface energy. jump_number defaults to last jump in the pose (assumed to be associated with ligand). energy should be less than energy_cutoff to pass.


EnzScore

Calculates scores of a pose e.g. a ligand-protein interface taking into account (or not) enzdes style cst_energy. Residues can be accessed by res_num/pdb_num or their constraint id. One and only one of res/pdb_num, cstid, and whole_pose tags can be specified. energy should be less than cutoff to pass.

<EnzScore name=(&string)  scorefxn=(&string, score12) whole_pose= (&bool,0) score_type = (&string) res_num/pdb_num = (see convetion) cstid =  (&string) energy_cutoff=(0.0 &float)/>


RepackWithoutLigand

Calculates delta_energy or RMSD of protein residues in a protein-ligand interface when the ligand is removed and the interface repacked. RMSD of a subset of these repacked residues (such as catalytic residues) can be accessed by setting the appropriate tags.

<RepackWithoutLigand name=(&string)  scorefxn=(&string, score12) target_res = (&string) target_cstids =  (&string) energy_threshold=(0.0 &float) rms_threshold=(0.5 &float)/>

Ligand-centric Filters

HeavyAtom

<HeavyAtom name="&string" chain="&string" heavy_atom_limit=(&int)/>

Stop growing this designed ligand once we reach this heavy atom limit

CompleteConnections

<CompleteConnections name="&string" chain="&string"/>

Are there any connections left to fulfill? If not, stop growing ligand

LIGAND_AREAS

<[name_of_this_ligand_area] chain="&string" cutoff=(float) add_nbr_radius=[true|false] all_atom_mode=[true|false] minimize_ligand=[float] Calpha_restraints=[float] high_res_angstroms=[float] high_res_degrees=[float] tether_ligand=[float] />

LIGAND_AREAS describe parameters specific to each ligand, useful for multiple ligand docking studies. "cutoff" is the distance in angstroms from the ligand an amino-acid's C-beta atom can be and that residue still be part of the interface. "all_atom_mode" can be true or false. If all atom mode is true than if any ligand atom is within "cutoff" of the C-beta atom, that residue becomes part of the interface. If false, only the ligand neighbor atom is used to decide if the protein residue is part of the interface. "add_nbr_radius" increases the cutoff by the size of the ligand neighbor atom's radius specified in the ligand .params file. This size can be adjusted to represent the size of the ligand, without entering all_atom_mode. Thus all_atom_mode should not be used with add_nbr_radius.

Ligand minimization can be turned on by specifying a minimize_ligand value greater than 0. This value represents the size of one standard deviation of ligand torsion angle rotation (in degrees). By setting Calpha_restraints greater than 0, backbone flexibility is enabled. This value represents the size of one standard deviation of Calpha movement, in angstroms.

During high resolution docking, small amounts of ligand translation and rotation are coupled with cycles of rotamer trials or repacking. These values can be controlled by the 'high_res_angstrom' and 'high_res_degrees' values respectively. A tether_ligand value (in angstroms) will constrain the ligand so that multiple cycles of small translations don't add up to a large translation.

INTERFACE_BUILDERS

<[name_of_this_interface_builder] ligand_areas=(comma separated list of predefined ligand_areas) extension_window=(int)/>

An interface builder describes how to choose residues that will be part of a protein-ligand interface. These residues are chosen for repacking, rotamer trials, and backbone minimization during ligand docking. The initial XML parameter is the name of the interface_builder (for later reference). "ligand_areas" is a comma separated list of strings matching LIGAND_AREAS described previously. Finally 'extension_window' surrounds interface residues with residues labeled as 'near interface'. This is important for backbone minimization, because a residue's backbone can't really move unless it is part of a stretch of residues that are flexible.

MOVEMAP_BUILDERS

<[name_of_this_movemap_builder] sc_interface=(string) bb_interface=(string) minimize_water=[true|false]/>

A movemap builder constructs a movemap. A movemap is a 2xN table of true/false values, where N is the number of residues your protein/ligand complex. The two columns are for backbone and side-chain movements. The MovemapBuilder combines previously constructed backbone and side-chain interfaces (see previous section). Leave out bb_interface if you do not want to minimize the backbone. The minimize_water option is a global option. If you are docking water molecules as separate ligands (multi-ligand docking) these should be described through LIGAND_AREAS and INTERFACE_BUILDERS.