Rosetta 3.1 Release Manual
|
Metadata This document was written by Sarel Fleishman, Jacob Corn, Eva Strauch, Justin Ashworth, and Spencer Blivens.
The rosetta_scripts is meant to provide an xml-scriptable interface for conducting all of the tasks that interface design developers produce. With such a scriptable interface, it is hoped, it will be possible for non-programmers to 'mix-and-match' different design strategies and apply them to their own needs. It is also hoped that through a common interface, code-sharing between different people will be smoother. Note that at this point, the only movers and filters that are implemented in this application are the ones described below. More will be made available in future releases. At the most abstract level, all of the computations that are needed in interface design fall into two categories: Movers and Filters. Movers change the conformation of the complex by acting on it, e.g., docking/design/minimization, and filters decide whether a given conformation should go on to the subsequent steps. Filters are meant to reduce the amount of computation that is conducted on conformations that show no promise. Then, a rosetta_scripts protocol is merely a sequence of movers and filters.
Copy, paste, fill in, and enjoy
<dock_design>
<SCOREFXNS>
</SCOREFXNS>
<TASKOPERATIONS>
</TASKOPERATIONS>
<FILTERS>
</FILTERS>
<MOVERS>
</MOVERS>
<APPLY_TO_POSE>
</APPLY_TO_POSE>
<PROTOCOLS>
</PROTOCOLS>
</dock_design>
The following simple example will compute ala-scanning values for each residue in the protein interface:
<dock_design>
<SCOREFXNS>
<interface weights=interface/>
</SCOREFXNS>
<FILTERS>
<AlaScan name=scan partner1=1 partner2=1 scorefxn=interface interface_distance_cutoff=10.0 repeats=5/>
<Ddg name=ddg confidence=0/>
<Sasa name=sasa confidence=0/>
</FILTERS>
<MOVERS>
<Docking name=dock fullatom=1 local_refine=1 score_high=soft_rep/>
</MOVERS>
<APPLY_TO_POSE>
</APPLY_TO_POSE>
<PROTOCOLS>
<Add mover_name=dock filter_name=scan/>
<Add filter_name=ddg/>
<Add filter_name=sasa/>
</PROTOCOLS>
</dock_design>
Rosetta will carry out the order of operations specified in PROTOCOLS, starting with docking (in this case this is all-atom docking using the soft_rep weights). It will then apply alanine scanning, repeated 5 times for better convergence, for every residue on both sides of the interface computing the binding energies using the interface weight set (counting mostly attractive energies). The binding energy (ddg) and surface area (sasa) will also be computed. All of the values will be output in a .report file. Notice that since ddg and sasa are assigned confidence=0, they are not used here as filters that can terminate a trajectory per se, but rather for reporting the values for the complex. An important point is that filters never change the sequence or conformation of the structure, so the ddg and sasa values are reported for the input structure following docking, with the alanine-scanning results ignored.
The following command line would run the above protocol, given that the protocol file name is ala_scan.xml
bin/rosetta_scripts.linuxgccrelease -s < INPUT PDB FILE NAME > -nstruct 20 -jd2:ntrials 100 -database ~/rosetta_database/ -ex1 -ex2 -parser:protocol ala_scan.xml -parser:view
-ntrials meaning how many attempts to do in total, it now means how many attempts to do per output structure that we want to write. So, -nstruct 20 -ntrials 100 means, for each of 20 trajectories, make 100 different write attempts, for a total of 200 potential attempts. Another way of looking at it is: nstruct is the maximum number of structures to output. nstruct x ntrials = the number of times to try. This is important for those cases in which you're actively filtering results rather than outputting decoys with certainty. In the latter case, you wouldn't notice the difference. The parser:view flag may be used with rosetta executables that have been compiled using the extras=graphics switch in the following way (from the Rosetta root directory):
scons mode=release -j3 bin extras=graphics
When running with -parser:view a graphical viewer will open that shows many of the steps in a trajectory. This is extremely useful for making sure that sampling is following the intended trajecotry.
This file lists the Movers, Filters, their defaults, meanings and uses as recognized by rosetta_scripts. It is written in an xml format and using many free viewers (e.g., vi) will highlight key xml notations, so long as the file has extension .xml.
- XML Basic whenever an xml statement is shown, the following convention will be used:
<...> to define a branch statement (a statement that has more leaves)
<.../> a leaf statement.
"" defines input expected from the user with ampersand (&) defining the type that is expected (string, float, etc.)
() defines the default value that the parser will use if that is not provided by the protocol.
The following are defined internally in the parser, and the protocol can use them without defining them explicitly.
- Predefined Movers:
NullMover: Has an empty apply. Useful for defining a filter without using a mover.
- Predefined Filters:
TrueFilter: Returns true. Useful for defining a mover without using a filter.
- CompoundStatement filter:
This is a special filter that uses previously defined filters to construct a compound logical statement with AND, OR, XOR, NAND and NOR operations. By making compound statements of compound statements, esssentially all logical statements can be defined.
<CompoundStatement name=(&string)>
<OPERATION filter_name=(true_filter &string)/>
<....
</CompoundStatement>
where OPERATION is any of the operations defined in CAPS above.Note that the operations are performed in the order that they are defined. No precedence rules are enforced, so that any precedence has to be explicitly written by making compound statements of compound statements.Also note that the first OPERATION is ignored, and the value of the first filter is simply assigned to the filter's results.
- Predefined ScoreFunctions:
score12: The default all-atom scorefunction used by rosetta ab-initio and design
score_docking: high resolution docking scorefxn (standard+docking_patch)
score_docking_low: low resolution docking scorefxn (interchain_cen)
soft_rep: soft_rep_design weights.
This section defines scorefunctions that will be used in Filters and Movers. This can be used to define any of the scores defined in the rosetta_database using the following statement:
<"scorefxn_name" weights=(standard &string) patch="&string">
where scorefxn_name will be used in the Movers and Filters sections to use the scorefunction. The name should therefore be unique and not repeat the predefined score names.
- Scorefunction modifiers
The apply_to_pose section may set up constraints, in which case it becomes necessary to set the weights in all of the scorefunctions that are defined. The default weights for all the scorefunctions are defined globally in the apply_to_pose section, but each scorefunction definition may change this weight. The following modifiers are recognized:
fnr=(the value set by apply_to_pose for favor_native_residue &float)
hs_hash=(the value set by apply_to_pose for hotspot_hash &float)
For Example:
<my_favorite_score weights="soft_rep_design" patch="dock" fnr=6.0/>
will multiply the favor_native_residue bonus by 6.0
This is a section that is used to change the input structure. The most likely use for this is to define constraints to a structure that has been read from disk.
- Recognized movers:
<favor_native_residue bonus=(1.5 &bool)/>
his section defines instances of the TaskOperation class hierarchy. They become available in the DataMap.
TaskOperation classes are used by TaskFactory to configure the behavior of PackerTask when it is generated on-demand for routines that use the "packer" to reorganize/mutate sidechains. When used by certain Movers (at present, the PackerRotamersMover and its subclasses), the TaskOperations control what happens during packing, usually by restriction "masks." A basic example:
...
<TASKOPERATIONS>
<ReadResfile name=rrf/>
<ReadResfile name=rrf2 resfile=resfile2/>
<PreventRepacking name=NotBeingUsedHereButPresenceOkay/>
<RestrictToRepacking name=rtrp/>
<OperateOnCertainResidues name=NoPackNonProt>
<PreventRepackingRLT/>
<ResidueLacksProperty property=PROTEIN/>
</OperateOnCertainResidues>
</TASKOPERATIONS>
...
<MOVERS>
<PackRotamersMover name=packrot scorefxn=sf task_operations=rrf,NoPackNonProt,rtrp/>
</MOVERS>
...
In the rosetta code, the TaskOperation instances are registered with and then later created by a TaskOperationFactory. The factory calls parse_tag() on the base class virtual function, with no result by default. However, some TaskOperation classes (e.g. OperateOnCertainResidues and ReadResfile above) do implement parse_tag, and therefore their behavior can be configured using additional options in the "XML"/Tag definition.
List of current TaskOperation classes in the core library (* indicates use-at-own-risk/not sufficiently tested/still under development):
InitializeFromCommandline
IncludeCurrent
ReadResfile
RestrictToRepacking
RestrictResidueToRepacking
PreventRepacking
OperateOnCertainResidues
RestrictAbsentCanonicalAAS*
InitializeExtraRotsFromCommandline*
SetRotamerCouplings*
AppendRotamer*
AppendRotamerSet*
PreserveCBeta*
SetSurfaceScoreWeight*
RestrictYSDesign*
// ResLvlTaskOperations: as a subtag for special OperateOnCertainResidues TaskOperation (one only)
RestrictToRepackingRLT
PreventRepackingRLT
AddBehaviorRLT
RestrictAbsentCanonicalAASRLT*
// ResFilters: as a subtag for special OperateOnCertainResidues TaskOperation (one only)
ResidueHasProperty
ResidueLacksProperty
ResidueName3Is*
Each mover definition has the following structure:
<"mover_name" name="&string" .../>
where "mover_name" belongs to a predefined set of possible movers that the parser recognizes and are listed below, name is a unique identifier for this mover definition and then any number of parameters that the mover needs to be defined.
- Special movers:
-
DockDesign mover:
This is a special mover that allows making a single compound mover and filter vector (just like protocols).
<DockDesign name=( &string)>
<Add mover_name=( null &string) filter_name=( true_filter &string)/>
...
</DockDesign>
-
LoopOver mover:
Allows looping over a mover using either iterations or a filter as a stopping condition (the first turns true). By using DockDesign mover above with loop can be useful, e.g., if making certain moves is expensive and then we want to exhaust other, shorter moves.
<LoopOver name=(&string) mover_name=(&string) filter_name=( true_filter &string) iterations=(10 &Integer)/>
Recognized Movers:
-
score_low is the scorefxn to be used for centroid-level docking; score_high is the scorefxn to be used for full atom docking; rb_jump controls the jump number over which tocarry out rb motions.
<Docking name="&string" score_low=(docking_score_low &string) score_high=(docking_score &string) fullatom=(0 &bool) local_refine=(0 &bool) rb_jump=(1 &Integer)/>
.
-
Prepack:
Performs something approximating rosetta++ prepacking (but less rigorously without rotamer-trial minimization) by doing sc minimization and repacking. Separates chains based on jump_num, does prepacking, then reforms the complex. If jump_num=0, then it will NOT separate chains at all.
<Prepack name=(&string) scorefxn=(score_docking &string) jump_number=(1 &integer)/>
-
RepackMinimize does the design/repack and minimization steps using different score function as defined by the protocol. repack_partner1 (and 2) defines which of the partners to design. If no particular residues are defined, the interface is repacked/designs. If specific residues are defined, then a shell of residues around those target residues are repacked/designed and minimized. repack_non_ala decides whether or not to change positions that are not ala. Useful for designing an ala_pose so that positions that have been changed in previous steps are not redesigned. min_rigid_body minimize rigid body orientation. (as in docking)
<RepackMinimize name="&string" scorefxn_repack=(score12 &string) scorefxn_minimize=(score12 &string) repack_partner1=(0 &bool) repack_partner2=(1 &bool) design=(1 &bool) interface_cutoff_distance=(8.0 &Real) repack_non_ala=(1 &bool) min_rigid_body=(1 &bool)>
<residue pdb_num/res_num, see below/>
</RepackMinimize>
-
Same as for DesignMinimize with the addition that a list of target residues to be hbonded can be defined. Within a sphere of 'interface_cutoff_distance' of the target residues,the residues will be set to be designed.The residues that are allowed for design are restricted to hbonding residues according to whether donors (STRKWYQN) or acceptors (EDQNSTY) or both are defined. If residues have been designed that do not, after design, form hbonds to the target residues with energies lower than the hbond_energy, then those are turned to Ala.
<DesignMinimizeHbonds name=(design_minimize_hbonds &string) hbond_weight=(3.0 &float) scorefxn_design=(score12 &string) scorefxn_minimize=score12) donors="design donors? &bool" acceptors="design acceptors? &bool" bb_hbond=(0 &bool) sc_hbond=(1 &bool) hbond_energy=(-0.5 &float) interface_cutoff_distance=(8.0 &float) design_partner1=(0 &bool) design_partner2=(1 &bool) repack_non_ala=(1 &bool) min_rigid_body=(1 &bool)>
<residue pdb_num="pdb residue and chain, e.g., 31B &string"/>
<residue res_num="serially defined residue number, e.g., 212 &integer"/>
</DesignMinimizeHbonds>
hbond_weight: sets the increase (in folds) of the hbonding terms in each of the scorefunctions that are defined.
bb_hbond: do backbone-backbone hbonds count?
sc_hbond: do backbone-sidechain and sidechain-sidechain hbonds count?
hbond_energy: what is the energy threshold below which an hbond is counted as such.
repack_non_ala: see RepackMinimize
-
Turns either or both sides of an interface to Alanines (except for prolines and glycines that are left as in input) in a sphere of 'interface_distance_cutoff' around the interface. Useful as a step before design steps that try to optimize a particular part of the interface. The alanines are less likely to 'get in the way' of really good rotamers.
<build_Ala_pose name=(ala_pose &string) partner1=(0 &bool) partner2=(1 &bool) interface_distance_cutoff=(8.0 &float)/>
-
To be used after an ala pose was built (and the design moves are done) to retrieve the sidechains from the input pose that were set to Ala by build_Ala_pose. Sidechains that are different than Ala will not be changed. Please note that naming your mover "SARS" is almost certainly bad luck and strongly discouraged.
<SaveAndRetrieveSidechains name=(save_and_retrieve_sidechains &string)/>
-
Carry out backrub style backbone and sidechain motions. With the values defined below, backrub will only happen on residues 31B, serial 212, and the serial span 10-20. If no residues and spans are defined then all of the interface residues on the defined partner will be backrubbed by default.
<Backrub name=(backrub &string) partner1=(0 &bool) partner2=(1 &bool) interface_distance_cutoff=(8.0 &Real) moves=(1000 &integer) sc_move_probability=(0.25 &float) scorefxn=(score12 &string)>
<residue pdb_num="pdb residue and chain, e.g., 31B &string"/>
<residue res_num="serially defined residue number, e.g., 212 &integer"/>
<span begin="serially defined residue number, e.g., 10 &integer" end="serially defined residue number, e.g., 20 &integer"/>
</Backrub>
-
Dumps a pdb. Recommended ONLY for debuggging as you can't change the name of the file during a run.
<DumpPdb name=(&string) fname=(dump.pdb &string)/>
-
Performs something approximating r++ prepacking by doing sc minimization and repacking. Separates chains based on jump_num, does prepacking, then reforms the complex.If jump_num=0, then it will NOT separate chains at all.
<Prepack name=(&string) scorefxn=(score_docking &string) jump_number=(1 &integer)/>
-
Experimental, and not tested sufficiently. Do domain-assembly sampling by fragment insertion in a linker region. These moves will change the disposition of one domain with respect to another in a multidomain protein. frag3 and frag9 specify the fragment-file names for 9-mer and 3-mer fragments.
<DomainAssembly name=(&string) linker_start_(pdb_num/res_num, see above) linker_end_(pdb_num/res_num, see above) frag3=(&string) frag9=(&string)/>
-
AtomTree Sets up an atom tree for use with subsequent movers. Connects pdb_num on host_chain to the nearest residue on the neighboring chain. Connection is made through connect_to on host_chain pdb_num residue
<AtomTree name=(&string) pdb_num/res_num=(see above) connect_to=(CA &string) host_chain=(2 &integer)/>
-
TryRotamers Produces a set of rotamers from a given residue. Use after AtomTree to generate inverse rotamers of a given residue.
<TryRotamers name=(&string) pdb_num/res_num=(see above) jump_num=(1, &Integer) scorefxn=(score12 &string) explosion=(0 &integer)/>
explosion: range from 0-4 for how much rotamer explosion to include. explosion in this context means EX_FOUR_HALF_STEP_STDDEVS (+/- 0.5, 1.0, 1.5, 2.0 sd) 1 = explode chi1 2 = explode chi1,2 3 = explode chi1,2,3 4 = explode chi1,2,3,4
-
DisulfideMover Introduces a disulfide bond into the interface. The best-scoring position for the disulfide bond is selected from among the residues listed in targets. This could be quite time-consuming, so specifying a small number of residues in targets is suggested. If no targets are specified on either interface partner, all residues on that partner are considered when searching for a disulfide. Thus including only a single residue for targets results in a disulfide from that residue to the best position across the interface from it, and omitting the targets param altogether finds the best disulfide over the whole interface. Disulfide bonds created by this mover, if any, are guaranteed to pass a DisulfideFilter.
<DisulfideMover name="&string" targets=(&string)/>
targets: A comma-seperated list of residue numbers. These can be either with rosetta numbering (raw integer) or pdb numbering (integer followed by the chain letter, eg '123A'). Targets are required to be located in the interface. Default: All residues in the interface. Optional
Each filter definition has the following format:
<"filter_name" name="&string" ... confidence=(1 &Real)/>
where "filter_name" belongs to a predefined set of possible filters that the parser recognizes and are listed below, name is a unique identifier for this mover definition and then any number of parameters that the filter needs to be defined. If confidence is 1.0, then the filter is evaluated as in predicate logic (T/F). If the value is less than 0.999, then the filter is evaluated as fuzzy, so that it will return True in (1.0 - confidence) fraction of times it is probed. This should be useful for cases in which experimental data are ambiguous or uncertain.
- Recognized Filters:
-
Computes the binding energy for the complex and if it is below the threshold returns true. o/w false. Useful for identifying complexes that have poor binding energy and killing their trajectory.
<Ddg name=(ddg &string) scorefxn=(score12 &string) threshold=(-15 &float)/>
-
TerminusDistance True if all residues in the interface are more than <distance> residues from the N or C terminus. If fails, reports how far failing residue was from the terminus. If passes, returns "1000"
<TerminusDistance name=(&string) jump_number=(1 &integer) distance=(5 &integer)/>
jump_number: Which jump to use for calculating the interface? distance: how many residues must each interface residue be from a terminus? (sequence distance)
-
Computes the number of residues in the interface specific by jump_number and if it is above threshold returns true. o/w false. Useful as a quick and ugly filter after docking for making sure that the partners make contact.
<ResInInterface name=(riif &string) residues=(20 &integer) jump_number=(1 &integer)/>
-
This filter checks whether residues defined by res_num/pdb_num are hbonded with as many hbonds as defined by partners, where each hbond needs to have at most energy_cutoff energy.
<HbondsToResidue name=(hbonds_filter &string) partners="how many hbonding partners are expected &integer" energy_cutoff=(-0.5 &float) backbone=(0 &bool) sidechain=(1 &bool) "res_num/pdb_num see above">
backbone: should we count backbone-backbone hbonds? sidechain: should we count backbone-sidechain and sidechain-sidechain hbonds?
-
Computes the interface sasa and if it's **higher** than threshold passes.
<Sasa name=(sasa_filter &string) threshold=(800 &float)/>
-
Filters for poses that place a neighbour of the types specified around a target residue in the partner protein.
<NeighborType name=(neighbor_filter &string) "res_num/pdb_num see above" distance=(8.0 &Real)>
<Neighbor type=(&3-letter aa code)/>
</NeighborType>
-
How many residues are within an interaction distance of target_residue across the interface. When used with neighbors=1 this degenerates to just checking whether or not a residue is at the interface.
<ResidueBurial name=(&string) "res_num/pdb_num see above" distance=(8.0 &Real) neighbors=(1 &Integer)/>
-
Maximum number of buried unsatisfied H-bonds allowed. If a jump number is specified (default=1), then this number is calculated across the interface of that jump. If jump_num=0, then the filter is calculated for a monomer. Note that unsat for monomers is often much higher than 20.
<BuriedUnsatHbonds name=(&string) jump_number=(1 &Size) cutoff=(20 &Size)/>
-
What is the distance between two residues?
<ResidueDistance name=(&string) res1_"res_num/pdb_num see above" res2_"resnum/pdb_num" distance=(8.0 &Real)/>
-
Tests the energy of a particular residue.
<EnergyPerResidue name=(energy_per_res_filter &string) scorefxn=(score12 &string) score_type=(total_score &string) pdb_num/res_num(see above) energy_cutoff=(0.0 &float)/>
-
ScoreType Computes the energy of a particular score type for the entire pose and if that energy is lower than threshold, returns true.
<ScoreType name=(score_type_filter &string) scorefxn=(score12 &string) score_type=(&string) threshold=(&float)/>
Don't use these energy filters directly after centroid level moves, because the energies are likely to be extremely high.
-
AlaScan Substitutes Ala for each interface position separately and measures the difference in ddg compared to the starting structure. The filter always returns true. The output is only placed in the .report file. Repeats causes multiple ddg calculations to be averaged, giving better converged values.
<AlaScan name=(&string) scorefxn=(score12 &string) interface_distance_cutoff=(8.0 &Real) partner1=(0 &bool) partner2=(1 &bool) repeats=(1 &Integer)/>
-
DisulfideFilter Require a disulfide bond between the interfaces to be possible. 'Possible' is taken fairly loosely; a reasonable centroid disulfide score is required (fairly close CB atoms without too much angle strain). Residues from targets are considered when searching for a disulfide bond. As for DisulfideMover, if no residues are specified from one interface partner all residues on that partner will be considered.
<DisulfideFilter name="&string" targets=(&string)/>
targets: A comma-seperated list of residue numbers. These can be either with rosetta numbering (raw integer) or pdb numbering (integer followed by the chain letter, eg '123A'). Targets are required to be located in the interface. Default: All residues in the interface. Optional
Generated on Tue Apr 20 07:50:05 2010 for Rosetta Projects by
1.5.2
© Copyright Rosetta Commons Member Institutions. For more information, see http://www.rosettacommons.org.