The RosettaCarbohydrate Framework was created by Dr. Jason W Labonte (JWLabonte@jhu.edu), in collaboration with Dr. Jared Adolf-Bryfogle (jadolfbr@gmail.com) and Dr. Sebastian Rämisch (raemisch@scripps.edu).

PIs are: Dr. Jeff Gray of JHU (jgray@jhu.edu) and Dr. William Schief of Scripps (schief@scripps.edu).

Currently, it is still in development. Here are tips for use. More will come.

## Structure Input

• All Rosetta runs with carbohydrate-containing structures should use an option to make Rosetta carbohydrate-aware. An error will be thrown if this is not present.

-include_sugars


### RSCB - .pdb files

PDBs from the RCSB should be able to be read in by default. However, in order to load a PDB file, one must have LINK records present. Rosetta will build the glycans out using internal names and create the glycans based on connectivity.

• Reading in most PDB files will require an option to map the non-specific HETNAM IDs to chemically accurate identifiers:

-alternate_3_letter_codes pdb_sugar

• When loading a file from the PDB, the order of HETATM and LINK records is important for reading it into Rosetta. Since pdb files are usually not formatted for Rosetta-compatibility, connections can be determined internally, ignoring the order of records. Instead atom distances are used to determine protein-sugar and sugar-sugar connections.

-auto_detect_glycan_connections

• the maximum and minimum bond lengths for a conection to be found are 1.3 and 1.6 A. Since many structures are chemically incorrect, these parameters can be changed to detect unphysical bonds, too:

 -min_bond_length < Real >
-max_bond_length < Real >

• if automatic detection fails, all bond calculations and connections can be monitored with -out::level 999

### GLYCAN

In order to load GLYCAN structures, one can pass the option -glycam_pdb_format in order to load in this type of file.

## Structure Output

In order to write out structures correctly pdb link records must be output. This option is now the default.

-write_pdb_link_records


## Full example

score.default.macosclangrelease \\
-include_sugars \\
-alternate_3_letter_codes pdb_sugar \\
-auto_detect_glycan_connections \\
-min_bond_length 1.1 \\
-max_bond_length 1.7 \\

-ignore_zero_occupancy false \\
-ignore_unrecognized_res \\
-out:output \\
-s 5t3x.pdb


# Nomenclature

Most of the time we deal with glycans, we use IUPAC names. The glycan 'root' as referred to in Rosetta, is the residue that the glycan is attached to protein. Some components, such as the GlycanResidueSelector, use 'glycan positions' to easily specify residues of glycans. These numbers go from 1 -> N, where 1 is the first glycan residue and N is the last residue. In order to find out the glycan position of the residue you are interested in, use the GlycanInfo application.

# Further Carbohydrate Information

Jason, fill this out!!!

# Applications

GlycanTreeRelax - Model glycan trees from the roots out to the foliage. Works for full denovo modeling or refinement. GlycanRelax - Basic sampling for glycan residues. GlycanInfo - Get information on all glycan trees within a pose

GlycanClashCheck - Obtain data on model clashes with and between glycans, or between glycans and other protein chains.

# RosettaScript Components

GlycanRelaxMover - Model glycan trees using known carbohydrate information. Works for full denovo modeling or refinement.

SimpleGlycosylateMover - Glycosylate poses with glycan trees.

GlycanTreeSelector - Select individual glcyan trees or all of them

GlycanResidueSelector - Select specific residues of each glycan tree of interest.

# Glycosylating Structures

Structures can be glycosylated either through a function accessible to PyRosetta or via RosettaScripts.

## RosettaScripts

See the SimpleGlycosylateMover documentation

## PyRosetta

Here is an example of adding a man9 to the pose. This can now be done in two ways within PyRosetta, either via the core function, or the class wrapper.

### Base Function

The following uses a function to glycosylate a pose using the IUPAC name:

/// @brief  Glycosylate the Pose at the given sequence position using an IUPAC sequence.
void glycosylate_pose(
Pose & pose,
uint const sequence_position,
std::string const & iupac_sequence,
bool const idealize_linkages = true );

Here is an example of using the function to glycosylate the pose using a man5 glycan, a commonly found glycan in biology.

from rosetta import *
from rosetta.core.pose.carbohydrates import glycosylate_pose

p = Pose("my_pose.pdb")
glycosylate_pose(p, 10, "a-D-Manp-(1->3)-[a-D-Manp-(1->3)-[a-D-Manp-(1->6)]-a-D-Manp-(1->6)]-b-D-Manp-(1->4)-b-D-GlcpNAc-(1->4)-b-D-GlcpNAc-", True)

print p
print p.residue(3)
print p.chain_sequence()

### SimpleGlycosylateMover

This mover is accessible both in PyRosetta and RosettaScripts. It was written by Jared Adolf-Bryfogle.

see SimpleGlycosylateMover for a full description.

Example using a man5:

from rosetta import *
from rosetta.protocols.carbohydrates import SimpleGlycosylateMover

p = Pose("my_pose.pdb")

glycosylator = SimpleGlycosylationMover()
glycosylator.set_glycosylation('man5')
glycosylator.set_positions(10)
glycosylator.apply(p)

print p
print p.residue(3)
print p.chain_sequence()

# Building Glycans

Glycans can be built by themselves using PyRosetta. There is currently no way to do this in RosettaScripts: Glycans are creating using their IUPAC names.

To properly build an oligosaccharide, Rosetta must know the following details about each sugar residue being created in the following order: • Main-chain connectivity — →2) (->2)), →4) (->4)), →6) (->6)), etc.; default value is ->4)- • Anomeric form — α (a or alpha) or β (b or beta); default value is alpha • Enantiomeric form — l (L) or d (D); default value is D • 3-Letter code — required; uses sentence case • Ring form code — f (for a furanose/5-membered ring), p (for a pyranose/6-membered ring); required Residues must be separated by hyphens. Glycosidic linkages can be specified with full IUPAC notation, e.g., -(1->4)- for “-(1→4)-”. Rosetta will assume -(1-> for aldoses and -(2-> for ketoses. Note that the standard is to write the IUPAC sequence of a saccharide chain in reverse order from how they are numbered.

The following example creates a pose from the IUPAC saccharide name:

from rosetta import *
from rosetta.core.pose import pose_from_saccharide_sequence