Back to Mover page.

Documentation added 25 Mar 2021 by Vikram K. Mulligan, Flatiron Institute (vmulligan@flatironinstitute.org).

trRosettaConstraintGenerator

The trRosettaConstraintGenerator takes as input a multiple sequence alignment, converts this to a one-hot 3D input tensor, and runs this through the trRosetta neural network to generate predictions of the probability distributions of inter-residue distances and orientations. It then converts this information to a list of Rosetta constraints.

These constraints can be useful for de novo structure prediction. In this case, however, it is not necessary to script this constraint generator directly. Instead, the trRosetta application and the trRosettaProtocol mover wrap this constraint generator, and provide a full structure prediction pipeline that takes an MSA as input and produces a pose as output. If one wishes to do more exotic things, such as using trRosetta distance and orientation constraints in design, docking, or loop modelling, it can be useful to have direct access to the trRosettaConstraintGenerator, however.

A note on nomenclature

Although "omega" and "phi" are commonly used to refer to the third and first mainchain backbone dihedrals of an alpha amino acid, and "theta" is used to refer to the second mainchain backbone dihedral of a beta-amino acid, in the context of trRosetta-related protocols, these Greek letters are assigned new meanings. Here, "omega" refers to the inter-residue dihedral angle between the CA and CB atoms of a first residue and the CB and CA atoms of a second residue. "Theta" refers to the inter-residue dihedral angle between the N, CA, and CB atoms of a first residue and the CB atom of a second residue. And "phi" refers to the inter-residue angle between the CA and CB atoms of a first residue and the CB atom of a second residue.

Compilation requirements

The trRosettaConstraintGenerator requires that Rosetta be compiled with Tensorflow support. See the autogenerated description below for details on how to compile Rosetta and link Tensorflow.

All options

Autogenerated Tag Syntax Documentation:


The trRosettaConstraintGenerator takes as input a file containing a multiple sequence alignment, feeds this to the trRosetta neural network, and uses the output to generate distance and angle constraints between pairs of residues as described in Yang et al. (2020) Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 117(3):1496-503. https://doi.org/10.1073/pnas.1914677117.

The trRosettaConstraintGenerator requires compilation with Tensorflow support. To compile with Tensorflow support:

  1. Download the Tensorflow 1.15 precompiled libraries for your operating system from one of the following. (Note that GPU versions require CUDA drivers; see https://www.tensorflow.org/install/lang_c for more information.) Linux/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-1.15.0.tar.gz Linux/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz Windows/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-1.15.0.zip Windows/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-windows-x86_64-1.15.0.zip MacOS/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-darwin-x86_64-1.15.0.tar.gz MacOS/GPU: None available.

  2. Unzip/untar the archive into a suitable directory (~/mydir/ is used here as an example), and add the following environment variables: Linux, Windows: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib MacOS: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib

  3. Edit your user.settings file (Rosetta/main/source/tools/build/user.settings), and uncomment (i.e. remove the octothorp from the start of) the following lines: import os 'program_path' : os.environ['PATH'].split(':'), 'ENV' : os.environ,

  4. Compile Rosetta, appending extras=tensorflow (for CPU-only) or extras=tensorflow_gpu (for GPU) to your scons command. For example: ./scons.py -j 8 mode=release extras=tensorflow bin

References and author information for the trRosettaConstraintGenerator constraint generator:

trRosetta Neural Network's citation(s): Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, and Baker D. (2020). Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA 117(3):1496-503. doi: 10.1073/pnas.1914677117.

trRosettaConstraintGenerator ConstraintGenerator's author(s): Vikram K. Mulligan, Systems Biology, Center for Computational Biology, Flatiron Institute vmulligan@flatironinstitute.org

<trRosettaConstraintGenerator name="(&string;)" msa_file="(&string;)"
        generate_distance_constraints="(true &bool;)"
        generate_omega_constraints="(true &bool;)"
        generate_theta_constraints="(true &bool;)"
        generate_phi_constraints="(true &bool;)"
        distance_constraint_prob_cutoff="(0.05 &real;)"
        omega_constraint_prob_cutoff="(0.55 &real;)"
        theta_constraint_prob_cutoff="(0.55 &real;)"
        phi_constraint_prob_cutoff="(0.65 &real;)"
        distance_constraint_weight="(1.0 &real;)"
        omega_constraint_weight="(1.0 &real;)"
        theta_constraint_weight="(1.0 &real;)"
        phi_constraint_weight="(1.0 &real;)" />
  • msa_file: Filename for a multiple sequence alignment file, in a3m format. Dashes indicate gap sequences, and lowercase characters will be removed (and flanking regions ligated). If not provided, the commandline option -trRosetta:msa_file will be used. One or the other is required.
  • generate_distance_constraints: Set whether this will generate distance constraints. Distance constraints constrain the distance between pairs of amino acids. These are symmetric, and are only generated once per amino acid pair (since dist(a,b) == dist(b,a)). Defaults to commandline setting -trRosetta:use_distance_constraints.
  • generate_omega_constraints: Set whether this will generate omega dihedral constraints. Omega constraints constrain the dihedral between CA1-CB1-CB2-CA2 in pairs of amino acids. These are symmetric, and are only generated once per amino acid pair (since omega(a,b) == omega(b,a)). Note that this is NOT omega the backbone dihedral torsion! Defaults to commandline setting -trRosetta:use_omega_constraints.
  • generate_theta_constraints: Set whether this will generate theta dihedral constraints. Theta constraints constrain the dihedral between N1-CA1-CB1-CB2 in pairs of amino acids. These are asymmetric (i.e. theta(a,b)!=theta(b,a)), so there are two per amino acid pair (unless a == b, which is skipped). Defaults to commandline setting -trRosetta:use_theta_constraints.
  • generate_phi_constraints: Set whether this will generate phi angle constraints. Phi constraints constrain the angle between CA1-CB1-CB2 in pairs of amino acids. These are asymmetric (i.e. phi(a,b)!=phi(b,a)), so there are two per amino acid pair (unless a == b, which is skipped). Note that this is NOT phi the backbone dihedral torsion! Defaults to commandline setting -trRosetta:use_phi_constraints.
  • distance_constraint_prob_cutoff: Set the probability cutoff below which we omit a distance constraint. Default 0.05, or whatever is set on the commandline with the -trRosetta:distance_constraint_prob_cutoff commandline option.
  • omega_constraint_prob_cutoff: Set the probability cutoff below which we omit a omega dihedral constraint. Default 0.55, or whatever is set on the commandline with the -trRosetta:omega_constraint_prob_cutoff commandline option.
  • theta_constraint_prob_cutoff: Set the probability cutoff below which we omit a theta dihedral constraint. Default 0.55, or whatever is set on the commandline with the -trRosetta:theta_constraint_prob_cutoff commandline option.
  • phi_constraint_prob_cutoff: Set the probability cutoff below which we omit a phi angle constraint. Default 0.65, or whatever is set on the commandline with the -trRosetta:phi_constraint_prob_cutoff commandline option.
  • distance_constraint_weight: Set the weight for trRosetta-generated distance constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:distance_constraint_weight commandline option.
  • omega_constraint_weight: Set the weight for trRosetta-generated omega dihedral constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:omega_constraint_weight commandline option.
  • theta_constraint_weight: Set the weight for trRosetta-generated theta dihedral constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:theta_constraint_weight commandline option.
  • phi_constraint_weight: Set the weight for trRosetta-generated phi angle constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:phi_constraint_weight commandline option.

Best practices

At the time of this writing, it is recommended to leave all options set to defaults unless one has reason to customize the settings.

Example

The following example roughly reproduces the protocol used by the trRosettaProtocol mover and trRosetta application. Note that (a) this is somewhat simplified, limiting its accuracy as a structure prediction protocol, and (b) it is not necessary to manually script the full structure prediction protocol, since the mover or the application can be used instead. This is only for demonstration purposes to show how the trRosettaConstraintGenerator can be scripted.

<ROSETTASCRIPTS>
	<SCOREFXNS>
		<ScoreFunction name="cen" weights="score0.wts" >
			<Reweight scoretype="atom_pair_constraint" weight="5.0" />
			<Reweight scoretype="angle_constraint" weight="1.0" />
			<Reweight scoretype="dihedral_constraint" weight="1.0" />
		</ScoreFunction>
		<ScoreFunction name="r15" weights="ref2015.wts" />
		<ScoreFunction name="r15_cst" weights="ref2015_cst.wts" />
	</SCOREFXNS>
	<SIMPLE_METRICS>
		<RMSDMetric name="measure_rmsd"
			use_native="true"
			super="true"
			custom_type="RMSD_after_centroid_phase_"
			rmsd_type="rmsd_protein_bb_heavy"
		/>
		<RMSDMetric name="measure_rmsd2"
			use_native="true"
			super="true"
			custom_type="RMSD_after_fullatom_phase_"
			rmsd_type="rmsd_protein_bb_heavy"
		/>

	</SIMPLE_METRICS>
	<CONSTRAINT_GENERATORS>
		<trRosettaConstraintGenerator name="gen_csts"
			msa_file="inputs/1r6j_msa.a3m"
		/> 
	</CONSTRAINT_GENERATORS>
	<MOVERS>
		<InitializeByBins name="randomize_bb"
			bin_params_file="ABBA"
		/>
		<AddConstraints name="gen_csts_mover"
			constraint_generators="gen_csts"
		/>
		<MinMover name="minimize"
			scorefxn="cen"
			tolerance="0.0000001"
			bb="true" chi="false" jump="0"
		/>
		<ClearConstraintsMover name="remove_csts" />
		<SwitchResidueTypeSetMover name="make_fullatom" set="fa_standard"/>
		<FastRelax name="frlx" repeats="3" scorefxn="r15_cst" />
	</MOVERS>
	<PROTOCOLS>
		<Add mover="randomize_bb" />
		<Add mover="gen_csts_mover" />
		<Add mover="minimize" />
		<Add metrics="measure_rmsd" />
		<Add mover="remove_csts" />
		<Add mover="make_fullatom" />
		<Add mover="gen_csts_mover" />
		<Add mover="frlx" />
		<Add metrics="measure_rmsd2" />
	</PROTOCOLS>
	<OUTPUT scorefxn="r15" />

The input multiple sequence alignment file can be generated with HHBlits or other software. See the trRosettaProtocol mover documentation for more details and an example of the .a3m file format.

Code organization

Please see the trRosetta application documentation for information about the trRosetta code organization.

References

  • The trRosetta neural network is described in Yang et al. (2020) Proc Natl Acad Sci USA 117(3):1496-1503 (doi 10.1073/pnas.1914677117).
  • The trRosettaProtocol mover, trRosettaConstriantGenerator, trRosetta application, and other C++ infrastructure were written by Vikram K. Mulligan (vmulligan@flatironinstitute.org), and are currently unpublished.

See Also