Back to TaskOperations page.
ResidueSelectors define a subset of residues from a Pose. Their apply() method takes a Pose and returns a ResidueSubset (a utility::vector1< bool >). This vector1 will be as large as there are residues in the input Pose, and its ith entry will be "true" if residue i has been selected.
Once you have a ResidueSubset, you can use it in a number of ways. For instance, you can use the OperateOnResidueSubset task operation to combine a ResidueSelector with a Residue Level TaskOperation to modify the ResidueLevelTasks which have a "true" value in the ResidueSubset. ResidueSelectors should be declared in their own block and named, or declared as subtags of other ResidueSelectors or of TaskOperations that accept ResidueSelectors (such as the OperateOnResidueSubset task operation).
Note that certain other Rosetta modules (e.g. the ReadResfile TaskOperation, which is not a Residue Level TaskOperation but can still accept a ResidueSelector as input) may also use ResidueSelectors. Ultimately, it is hoped that many Rosetta components will be modified to permit this standardized method of selecting residues.
The purpose of separating the residue selection logic from the modifications that TaskOperations perform on a PackerTask is to make available the complicated logic of selecting residues that often lives in TaskOperations. If you have a complicated TaskOperation, consider splitting it into a ResidueSelector and operations on the residues it selects.
ResidueSelectors can be declared in their own block, outside of the TaskOperation block. For example:
Some ResidueSelectors can nest other ResidueSelectors in their definition; e.g.
<RESIDUE_SELECTORS> <Chain name="chA" chains="A"/> <Index name="res1to10" resnums="1-10"/> </RESIDUE_SELECTORS>
In which case, the structure of the Neighborhood ResidueSelector will be stated as
<RESIDUE_SELECTORS> <Neighborhood name="chAB_neighbors"> <Chain chains="A,B"> </Neighborhood> </RESIDUE_SELECTORS>
With the <(Selector)> subtag designating that any ResidueSelector can be nested inside it.
<Neighborhood name=(%string) > <(Selector)> </Neighborhood>
A mover that can handle ResidueSelectors will take the following option:
<Not name="(&string)" selector="(&string)">
<Not name="(&string)"> <(Selector) .../> </Not>
any ResidueSelector can be defined as a subtag of the Not selector. You cannot, however, pass the subselector by name except by using the "selector" option.
<Not name="all_but_chA"> <Chain chains="A"/> </Not>
<And name="(&string)" selectors="(&string)"> <(Selector1)/> <(Selector2)/> ... </And>
<Or name="(&string)" selectors="(&string)"> <(Selector1)/> <(Selector2)/> ... </Or>
<True name="full_pose" />
Selects residues in the pose at random. Note that this residue selector is stochastic. This is, it will return a different set of residues every time it is called. However, the randomly selected residues can be saved using the StoreResidueSubsetMover and retrieved using the StoredResidueSubset selector.
<RandomResidue name="(&string)" selector="(TrueSelector &string)" num_residues="(1 &int)" />
<Index name="(&string)" resnums="(&string)" error_on_out_of_bounds_index="(true &bool)" reverse="(false &bool)" />
Selects residues by their full Rosetta residue type name. At least one of residue_names and residue_name3 must be specified.
<ResidueName name="(&string)" residue_names="(&string)" residue_name3="(&string)" />
You should provide at least one of these:
residue_names - A comma-separated list of Rosetta residue names (including patches). For example, "CYD" will select all disulfides, and "CYD,SER:NTermProteinFull,ALA" will select all disulfides, alanines, and N-terminal serines -- all other residues will not be selected (i.e. be false in the ResidueSubset object).
residue_name3 - A comma-separated list of 3-letter Rosetta residue names. These will be selected regardless of variant type. For example, "SER" will select residues named "SER", "SER:NtermProteinFull", and "SER:Phosphorylated".
Example This example will select all variants of ALA, C-terminal ASN residues, and disulfides:
<ResidueName name="change" residue_names="ASN:CtermProteinFull,CYD" residue_name3="ALA" />
Find sequence motifs in a protein
These Residue selectors use the underlying Antibody Modeling and Design Framework, and require a renumbered antibody structure. Please see General Antibody Options and Tips for more. If your antibody was output by RosettaAntibody, it is already renumbered into the Chothia Scheme, which is the default.
Selects CDR residues in an antibody or camelid antibody.
<AntibodyRegion name="(&string)" region="(&string)" numbering_scheme="(&string)" cdr_definition="(&string)" />
cdr_definition (&string): Set the cdr definition you want to use. Must also set the numbering_scheme XML option.
Selects CDR residues in an antibody or camelid antibody.
<CDR name="(&string)" cdrs="(&string,&string)" numbering_scheme="(&string)" cdr_definition="(&string)" />
cdr_definition (&string): Set the cdr definition you want to use. Must also set the numbering_scheme XML option.
These Residue selectors use the underlying RosettaCarbohydrate Framework.
The BinSelector selects residues that fall in a named mainchain torsion bin (e.g. the "A" bin, corresponding to alpha-helical residues by the "ABEGO" nomenclature). Non-polymeric residues are ignored. By default, only alpha-amino acids are selected, though this can be disabled.
<Bin name="(&string)" bin="(&string)" bin_params_file="('ABEGO' &string)" select_only_alpha_aas="(true &bool)" />
This example selects all residues that are in the region of Ramachandran space accessible to D-proline (which can be useful in the context of a script that attempts to design such positions to D-proline):
<Bin name="select_d_pro_positions" bin="DPRO" bin_params_file="PRO_DPRO" />
The BinSelector can be combined with AND, OR, or NOT selectors to select multiple regions. For example, the following would select residues that are in the right- or left-handed helical regions of Ramachandran space:
<Bin name="right_handed_helices" bin="A" bin_params_file="ABBA" /> <Bin name="left_handed_helices" bin="Aprime" bin_params_file="ABBA" /> <Or name="right_or_left_handed_helices" selectors="right_handed_helices,left_handed_helices" />
The BondedResidueSelector takes (required) a residue selector or a comma-separated list of residue numbers and selects all residues with chemical bonds to the input residues. This will include both primary sequence neighbors and any other covalently bound residues, including but not limited to bound metal ions (if set up using -auto_setup_metals), carbohydrates, disulfide partners, etc.
<Bonded name="(&string)" resnums="(&string)" selector="(&string)"/>
The BondedResidueSelector can also take a residue selector as a subtag:
Only one residue selector may be provided, and it is mutually exclusive with the resnum list.
<Bonded name="(&string)" > <Index resnums="2,3" /> </Bonded>
HBondSelector selects all residues with hydrogen bonds to the residues specified in the input (either by a comma-separated resnum list or by a residue selector). If no input residues are selected, then all residues in the pose forming hydrogen bonds stronger than the specified energy cutoff are selected.
<HBond name="(&string)" resnums="(&string)" residue_selector="(&string)" include_bb_bb="(false &bool)" hbond_energy_cutoff="(-0.5 &Real)" scorefxn="(&string)" />
The HBondSelector can also take a residue selector as a subtag. Only one residue selector may be provided, and it is mutually exclusive with the resnum list.
<InterfaceByVector name="(%string)" cb_dist_cut="(11.0&float)" nearby_atom_cut="(5.5%float)" vector_angle_cut="(75.0&float)" vector_dist_cut="(9.0&float)" grp1_selector="(%string)" grp2_selector="(%string)"/>
<InterfaceByVector name="(%string)" cb_dist_cut="(11.0&float)" nearby_atom_cut="(5.5%float)" vector_angle_cut="(75.0&float)" vector_dist_cut="(9.0&float)"> <(Selector1)/> <(Selector2/> </InterfaceByVector>
The LayerSelector lets a user select residues by burial. Burial can be assessed by number of sidechain neighbors within a cone along the CA-CB vector (the default method), or by SASA. When using SASA, the solvent exposure of the designed position depends on the conformation of neighboring side chains; this is useful when you are making one or two mutations and not changing many neighboring amino acids. When using side chain neighbors, solvent exposure depends on which direction the amino acid side chain is pointed; this is useful for de novo design or protocols in which many amino acids will be designed simultaneously.
<Layer name="(&string)" select_core="(false &bool)" select_boundary="(false &bool)" select_surface="(false &bool)" ball_radius="(2.0 &Real)" use_sidechain_neighbors="(true &bool)" sc_neighbor_dist_exponent="(1.0 &Real)" sc_neighbor_dist_midpoint="(9.0 &Real)" sc_neighbor_denominator="(1.0 &Real)" sc_neighbor_angle_shift_factor="(0.5 &Real)" sc_neighbor_angle_exponent="(2.0 &Real)" core_cutoff="(5.2 &Real)" surface_cutoff="(2.0 &Real)" />
Sidechain neighbor-specific options:
Neighbor residues are counted, weighted by a factor that is a distance factor multiplied by an angle factor. The two factors are calculated as follows:
distance factor = 1 / (1 + exp( n*(d - m) ) ), where d is the distance of the neighbor from the residue CA, m is the midpoint of the distance falloff, and n is a falloff exponent factor that determines the sharpness of the distance falloff (with higher values giving sharper falloff near the midpoint distance).
angle factor = ( (cos(theta)+a)/(1+a) )^b, where theta is the angle between the CA-CB vector and the CA-neighbor vector, a is an offset factor that widens the cone somewhat, and b is an exponent that determines the sharpness of the angular falloff (with lower values resulting in a broader cone with a sharper edge falloff).
The parameters above generally need not be changed from their default values. If the user wishes to change them, though, he or she can do so by altering the following:
LigandMetalContactSelector selects all residues which form contacts with metal atoms, either as single ions or as part of a larger complex. It optionally takes a residue selector (as a subtag or previously defined selector) or a resnum list to indicate which metal-containing residues' contacts should be selected. Contacts are identified using the same procedure as the SetupMetalsMover and the -auto_setup_metals flag (see Metals); a potential metal-binding atom is considered to bind a metal if the distance between it and the metal ion is no greater than the sum of its van der Waals radius and that of the metal multiplied by the provided dist_cutoff_multiplier.
<LigandMetalContactSelector name="(&string;)" residue_selector="(&string;)" dist_cutoff_multiplier="(1 ℜ)" />
<LigandMetalContactSelector name="(&string;)" dist_cutoff_multiplier="(1 ℜ)" > <Residue Selector Tag ... /> </LigandMetalContactSelector>
<LigandMetalContactSelector name="(&string;)" dist_cutoff_multiplier="(1 ℜ)" resnums="(&resnum_list_with_ranges;)" />
residue_selector: Name of the residue selector for the ligand dist_cutoff_multiplier: Multiplier for the distance from the metal atom for contact detection (default 1.0) resnums: List of residue numbers indicating which ligands' contacts should be selected. Cannot be used with a residue selector.
<Neighborhood name=(%string) resnums=(%string) distance=(10.0%float)/>
<Neighborhood name=(%string) selector=(%string) distance=(10.0%float)/>
<Neighborhood name=(%string) distance=(10.0%float)> <Selector ... /> </Neighborhood>
<NumNeighbors name="(%string)" count_water="(false&bool)" threshold="(17%integer)" distance_cutoff="(10.0&float)"/>
<Phi name="(&string)" select_positive_phi="(true &bool)" ignore_unconnected_upper="(true &bool)" />
ignore_unconnected_upper: If true (the default) then C-terminal residues and other residues with nothing connected at the upper connection are not selected. If false, then these residues can be selected, depending on their phi values. Note that anything lacking a lower connection is never selected.
The PhiSelector selects alpha-amino acids that are in either the positive phi or negative phi region of Ramachandran space. Ligands and polymeric residues that are not alpha-amion acids are never selected. Alpha-amino acids with no lower connection (or nothing connected at the lower connection) are also never selected. By default, alpha-amino acids with no upper connection are not selected, though this can be disabled.
The PhiSelector is convenient for:
Counting and limiting the number of positive-phi positions when sampling loop conformations.
<PairedSheetResidueSelector name=(%string) secstruct=(%string, "") sheet_topology=(%string) use_dssp=(%bool, True) />
The PairedSheetResidueSelector selects all residues involved in strand-strand pairings. The set of paired residues is computed by a combination of the secondary structure and user-specified sheet topology. For example, consider an antiparallel beta sheet with secondary structure "LEEEELLEEEEEELLEEEEEEL". In this case strand 1 is four residues (2-5), and strands 2 (residues 8-13) and 3 (residues 16-21) each have 6 residues. If the given sheet topology is '1-2.A.-1' (i.e. strand 1-2 are paired in an antiparallel direction with register shift 1), the paired residues in those strands are 2-11, 3-10, 4-9, and 5-8. Thus, residues 2, 3, 4, 5, 8, 9, 10, and 11 will be selected. Although residues 12 and 13 are in strand 2, they are not paired with anything in the given topology and will not be selected. Similarly, a given topology of '1-2.A.-1;2-3.A.0' will select all residues marked 'E' by DSSP (2, 3, 5, 8, 9, 10, and 11 from the '1-2.A.-1' pairing, plus 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, and 21 from the '2-3.A.0' pairing).
secstruct - Secondary structure to be used to determine strand residues. If not specified, secondary structure will be chosen based on the value of the 'use_dssp' option
use_dssp - If true, and secstruct is not given, the input secondary structure will be computed by DSSP. If false, the secondary structure saved in the pose will be used. Note this secondary structure saved in the pose may not always reflect the pose contents unless it is explicitly set via DsspMover.
sheet_topology - String describing sheet topology, of the format A-B.P.R, where A is the strand number of the first strand in primary space, B is the strand number of the second strand in primary space, P is 'P' for parallel and 'A' for antiparallel, and R is the register shift.
The following example will select all paired residues among strands 1 and 2, using a secondary structure computed by DSSP:
<PairedSheetResidueSelector name="E1-E2_pairs" sheet_topology="1-2.A.-1" use_dssp="1" />
<PrimarySequenceNeighborhood name=(%string) selector=(%string) lower=(1%int) upper=(1%int) />
<PrimarySequenceNeighborhood name=(%string) lower=(1%int) upper=(1%int) > <Selector ... /> </PrimarySequenceNeighborhood>
lower - Number of residues to select lower (i.e. N-terminal) to the input selection. Default=1
upper - Number of residues to select upper (i.e. C-terminal) to the input selection. Default=1
The RamaMutationSelector selects positions based on their
rama_prepro score. Optionally, it can select based on the
rama_prepro score that a position would have if it were mutated to a user-defined residue type. This is useful for selecting positions that are in regions of Ramachandran space that would be strongly favoured by a given conformationally-constrained type, such as L- or D-proline, 2-aminoisobutyric acid (AIB), etc.
Note that this selector does not select residues at termini or cutpoints, since these do not have defined
rama_prepro scores. It also ignores non-polymeric residue types.
The following example selects all residues that are in regions of Ramachandran space in which proline would have a
rama_prepro score less than or equal to -0.5 Rosetta energy units.
<RamaMutationSelector name="pro_positions" target_type="PRO" score_threshold="-0.5" />
<RamaMutationSelector name="(&string)" target_type="(&string, '')" score_threshold="(&real, 0.0)" rama_prepro_multiplier="(&real, 0.45)" />
rama_preproscore of the existing type at each position is used. If specified, then the
rama_preproscore of the specified type, given the conformation at the position, is used. Note that this is a full name, not a three-letter code (e.g. "DPRO" for D-proline instead of "DPR").
rama_preproscore. Positions that, when mutated to the specified type, have a
rama_preproscore lower than this threshold are selected. Default 0.0 Rosetta energy units.
rama_preproterm. The score is multiplied by this value before being compared to the threshold. This defaults to 0.45 to match the
rama_preproweight in the
Note that the
rama_prepro energy is a two-body energy dependent on a residue's conformation, its identity, and the identity of its C-terminal neighbour (with different lookup tables used for residues preceding proline and residues not preceding proline). Because it is a two-body energy, the score for a particular position is divided over that position and the i+1 position. This means that the final score table will have values that do not correspond to the values used for evaluating this selector, since each position's
rama_prepro energy is the sum of its own energy and that of the i-1 position.
<SSElement name="(&string)" selection="(&string)" to_selection="(&string)" reassign_short_terminal_loop="( (2 &int)" chain="(&string)" />
selection + to_selection: selects both selections and the residues between the selections
reassign_short_terminal_loop: how many residues on each termini to ignore if they are loops
Notation for selection and to_selection:
<SSElement name="(&string)" selection="1,H,M" to_selection="1,H,L" chain="A"/>
Autogenerated Tag Syntax Documentation:
SecondaryStructureSelector selects all residues with given secondary structure. For example, you might use it to select all loop residues in a pose. SecondaryStructureSelector uses the following rules to determine the pose secondary structure: 1. If pose_secstruct is specified, it is used. 2. If use_dssp is true, DSSP is used on the input pose to determine its secondary structure. 3. If use_dssp is false, the secondary structure stored in the pose is used.
<SecondaryStructure name="(&string;)" overlap="(0 &non_negative_integer;)" minH="(1 &non_negative_integer;)" minE="(1 &non_negative_integer;)" include_terminal_loops="(false &bool;)" use_dssp="(true &bool;)" pose_secstruct="(&string;)" ss="(&string;)" />
Example The example below selects all residues in the pose with secondary structure 'H' or 'E'.
<SecondaryStructure name="all_non_loop" ss="HE" />
<SymmetricalResidue name=(%string) selector=(%string) />
The SymmetricalResidueSelector, when given a selector, will return all symmetrical copies (including the original selection) of those residues. While the packer is symmetry aware, not all filters are. This selector is useful when you need to explicitly give residue numbers but you are not sure which symmetry subunit you need.
<Task name=(%string) fixed=(%bool, False) packable=(%bool, True) designable=(%bool, True) task_operations=(%string) />
The TaskSelector uses user-provided task operations to define a selection. Task operations are run on the pose, and residues are selected based on their status in the resulting PackerTask (designable, packable, or fixed). Note that if all of these options are false, no residue will be selected. This is useful for legacy protocols which still use task operations to select residues (which were written before ResidueSelectors existed). New protocols should use ResidueSelectors to select residues.
task_operations - Required. The task operations used to define the selection.
fixed - If true, residues in the PackerTask marked as fixed (i.e. not packable or designable) will included in the selection. Default = False
packable - If true, residues in the PackerTask marked as packable will be included in the selection. Default = True
designable - If true, residues in the PackerTask marked as designable will be included in the selection. Default = True
<ResiduePDBInfoHasLabel name=(%string) property=(%string) />
The ResiduePDBInfoHasLabel residue selector selects all residues with the given PDB residue label. Some protocols (e.g. MotifGraft, Disulfidize) use these labels to mark residues, and this selector allows those residues to be selected without the user's knowledge of which residues were marked.
label - Required. The PDB residue info label to be selected. (e.g. "DISULFIDIZE")
Example The example below selects all residues that were converted to disulfides by the Disulfidize mover.
<ResiduePDBInfoHasLabel name="all_disulf" property="DISULFIDIZE" />
The UnsatSelector selects all the backbone amines or carbonyls (but not both) that are not satisfied by a hydrogen bond. The general format of the selector is:
<Unsat name="(&string)" consider_mainchain_only="(true &bool)" check_acceptors="(true &bool)" hbond_energy_cutoff="(-0.5 &real)" scorefxn="(&string)" legacy="(false &bool)"/>
This example selects all residues in the structure that has a carbonyl that is not satisfied by a hydrogen bond from backbone:
<Unsat name="select_unsat_carbonyl" scorefxn="score"/>
This example selects all residues in the structure that has a backbone amine that is not satisfied by a hydrogen bond from backbone or side chain:
<Unsat name="select_unsat_carbonyl" scorefxn="score" check_acceptors="false" consider_mainchain_only="false"/>
Creates a residue subset by retrieving a residue subset that has been cached into the current pose by the StoreResidueSubsetMover. The pose length must be the same as when the subset was store.
<StoredResidueSubset name="(&string)" subset_name="(&string)" />
<RESIDUE_SELECTORS> <!-- Creates a subset consisting of whatever is currently chain B --> <Chain name="chainb" chains="B" /> <!-- Retrieves the residue subset created by the "StoreResidueSubset" mover --> <StoredResidueSubset name="get_original_chain_b" subset_name="original_chain_b" /> </RESIDUE_SELECTORS> <MOVERS> <!-- stores a subset consisting of whatever is in chain B when this mover is called --> <StoreResidueSubset name="store_subset" residue_selector="chainb" subset_name="original_chain_b" /> </MOVERS>
A ResidueSelector that applies a given residue selector to the native pose. If the native pose is shorter than the trajectory pose, extra 'false' values will be appended to the end of the selection to make it the correct size. Conversely, values are removed fromt he end of the selection if the native pose is longer than the trajectory pose.
<NativeSelector name="(string)" residue_selector="(string)" />
This script prints the sequence for residues that are defined as buried in the pose
passed using the flag
<ROSETTASCRIPTS> <RESIDUE_SELECTORS> <Layer name="core_res" select_core="1" select_boundary="0" select_surface="0" /> <NativeSelector name="original_core" residue_selector="core_res" /> </RESIDUE_SELECTORS> <SIMPLE_METRICS> <SequenceMetric name="seq" residue_selector="original_core" /> </SIMPLE_METRICS> <MOVERS> <RunSimpleMetrics name="run_metrics" metrics="seq" prefix="seq_" /> </MOVERS> <PROTOCOLS> <Add mover="run_metrics"/> </PROTOCOLS> </ROSETTASCRIPTS>
Scores a copy of the pose and selects residues that score within the specified limits for a chosen score type. This could be used to select residue positions that score poorly and design such that the score improves. The tags, score_type, lower_threshold and upper_threshold are required.
<ScoreTermValueBased name=(%string) score_type=(%string) lower_threshold=(%real) upper_threshold=(%real) score_fxn=(%string, default from command line) selector=(%string) />
<ScoreTermValueBased name=(%string) score_type=(%string) lower_threshold=(%real) upper_threshold=(%real) score_fxn=(%string, default from command line) resnums=(%string, ALL)/>
<ScoreTermValueBased name=(%string) score_type=(%string) lower_threshold=(%real) upper_threshold=(%real) score_fxn=(%string, default from command line)> <Selector ... /> </ScoreTermValueBased>