This glossary collects lots of the Rosetta terms with short (sentence-to-paragraph) definitions. You'll see definitions of objects in the code, biophysics concepts, and adminstrivia. Many of these are terms of art in structural biology with the particular nuances that apply in Rosetta.

#### ABEGO

Designation that indicates a residue's position in Ramachandran space (A = right-handed alpha or 310 helix; B = right-handed beta strands and extended conformations; E = left-handed beta strands; G = left-handed helices) and cis omega angles (O). See citation here.

#### abinitio structure predition

Prediction of molecular structure given only its sequence. Known also as de novo modeling.

In Rosetta, ab initio modeling uses statistical information from the PDB such as fragments and statistical potentials.

See fullatom.

#### alpha helix

A common motif in the secondary structure of proteins, the alpha helix (α-helix) is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier (i+4 - i hydrogen bonding). Among types of local structure in proteins, the α-helix is the most regular and the most predictable from sequence, as well as the most prevalent.

#### analogue

Rosetta homology modeling doesn't actually need strict evolutionary relationships, and can use analogues as templates.

#### annotated sequence

Rosetta will often record the sequence of a protein as the one letter amino acid codes, expanding when necessary with square brackets to indicate patches like post-translational modifications.

#### Atom

A class storing the Cartesian position of an atom in a Residue.

#### atom tree

The atom tree connects atoms in the pose, and is used to convert internal coordinates into cartesian coordinates. Normally derived from the fold tree.

#### AtomTree

Core::kinematics class for defining atomic connectivity.

#### AtomType

A class which stores the properties of a particular kind of atom. (e.g. a carboxylate oxygen). See Rosetta AtomTypes for more details.

#### B factor

The "temperature factor" from crystallography and seen in PDB files, the larger the value the more "flexible" the atom is

#### backbone

In biopolymers, the backbone is those atoms which form the polymeric chain. In proteins these are the N, CA, C, and O atoms and their hydrogens. In nucleic acids it is the phophate and sugars.

#### base

The non-backbone portion of a nucleotide. Analogous to a protein's side chain.

#### benchmark study

Tests done to confirm the performance of a new algorithm or method, results are compared to previous results using the same starting data

#### beta sheet

A common motif in the secondary structure of proteins, the beta sheet (β-sheet) is a mostly flat extended structure made up of individual (possibly non-consecutive) extended chains (β-strand) held together by alternating hydrogen bonds.

#### binding affinity or binding energy

How strongly two molecules are associated with one another.

#### binding interface

The point of contact between two molecules.

A common definition in Rosetta is any residue with the C-beta atom or any heavy atom within 6 Angstroms of the other binding partner.

#### biopolymers

Polymeric molecules important for biological systems. Typical biopolymers are proteins, RNA, DNA and carbohydrates.

#### blind docking

Docking where the structure of the docked complex is unknown.

#### bootcamp

An intense week-long Rosetta training session for new developers.

#### bound docking

The complex structure that is used for reference in docking and rmsd calculations is determined experimentally by X-rays/NMR.

See trunk.

#### Cartesian coordinates

Coordinates with spatial positions specified by xyz coordinates. Contrast this with internal coordinates. The conversion between the two is kinematics.

#### Cartesian minimization

Gradient minimization based on moving atoms in xyz Cartesian space, rather than with internal coordinates.

This requires an extra term (cart_bonded) to maintain bond lengths and angles to their near-ideal values.

#### CCD

Cyclic coordinate descent. A loop closure protocol where backbone dihedrals are progressively adjusted to minimize the gap in the loop backbone.

#### centroid

A reduced representation mode, used for simplifying the representation of the system, to permit faster sampling and scoring.

For proteins, each residue is represented by five backbone atoms (N, CA, C, O and the polar hydrogen on N) and one pseudo-atom, the “centroid,” to represent the side chain.

[Explain further how centroid is calculated.]

Gray et al., J. Mol. Biol. (2003) 331, 281–299

#### chain

In Rosetta, a chain is a single, covalently connected molecule.

In the PDB format, a chain is all residues which share a chain identification label.

#### chainbreak

A gap in connectivity (in the AtomTree) between chemically connected / sequentially adjacent residues. These are used in CCD (Cyclic Coordinate Descent) loop closure.

#### ChemicalManager

A singleton class in Rosetta which keeps track of things like ResidueTypeSets.

#### chi angles

Chi angles are the dihedral angles which control the heavy atom positions of side chain residues.

In nucleic acids, one of the

#### chi1, chi2, chi3, & chi4

Specific sidechain chi angles of protein residues. They are enumerated from the C-alpha atom outward, so chi1 would be the dihedral between N-Ca-Cb-Cg.

#### clash

Two (or more) atoms being too close to be energetically favorable (essentially an overlap of vdW radii)

#### cluster

Clustering of structures involves grouping structures with "similar" structures. These groups of similar structures are called "clusters". The measure of structure similarity are typically either RMSD or GDT.

#### coarse grain

Initial modeling, where all atoms or energy terms may not be represented.

#### commit

This is a term related to how version control is used. A commit is when you upload your changes from your computer to the common code source.

#### comparative modeling

Prediction of protein structure based on sequence and the structures of closely related proteins. Also called "Homology Modeling"

#### conformation

The three dimensional organization of atoms in a structure.

#### Conformation

A class which contains Residue objects and FoldTree. This is the part of the Pose which keeps track of coordinates. This is linked by the kinematic layer to describe internal-coordinate folding.

#### conformer

One of a set of 3 dimensional orientations a ligand, small molecule or amino acid side chain. Sometimes refered to in Rosetta as a "rotamer".

#### constraint

When used with Rosetta, actually a "restraint": an adjustment to the score function to take into account additional geometric information

#### contact order

Taken from "Contact order and ab initio protein structure prediction," Bonneau et. al, Protein Science (2002), 11:1937-1944: "The relative CO is the average sequence separation of residues that form contacts in the three-dimensional structure divided by the length of the protein."

#### Critical Assessment of PRediction of Interactions (CAPRI)

Protein-protein interactions and other interactions between macromolecules are essential to all aspects of biology and medical sciences, and a number of methods have been developed to predict them. CAPRI is a community wide experiment designed to assess those that are based on structure. Since CAPRI began in 2001, the experiment has had two to four prediction rounds each year, with one or a few targets per round.

#### Critical Assessment of Techniques for Protein Structure Prediction (CASP)

The Critical Assessment of protein Structure Prediction (CASP) experiments aim at establishing the current state of the art in protein structure prediction, identifying what progress has been made, and highlighting where future effort may be most productively focused. CASP has been held every two years starting in 1994. Rosetta has participated in several CASP experiments.

#### crystal neighbors

Is crystal structure a so called native structure? Crystal is composed of approximately 40~70% of water molecules, which gives crystallographers confidence saying proteins in crystal lattice should be able to represent proteins in biological environments, especially when proteins in crystal lattice often times are able to undergo biological reactions they are capable of in cells. However, there is an inevitable artifact in crystal lattice - that is regions where proteins adjacent to each other, making so called crystal contacts. Conformations in regions where proteins have contacts somehow are altered to some extent. Rosetta sometimes is able to sample conformations where the RMSD are 3 or 4 A away from "native crystal structure" but have lower energies, which are the results from variations of a section of loop. And this loop region happens to locate at the spot where crystal contact occurs. Therefore, we now are thinking about our definition of "native structure", where native structure is supposed to be the conformation of the protein exists in cell.

#### crystallography phasing

The critical step of solving a crystal structure is to get the phase either via molecular replacement or experimental methods. The technically easier way to get the phase is by the method of molecular replacement, where crystallographers utilize existing structures with high structural similarities to help guide the search of phase. However, in some hard cases, where there is no structurally similar structures exist, or structures have too low sequence identities (below 15~20%), crystallographers then have to get the phase through experimental methods, which are much more tedious and difficult compared to molecular replacement method. Rosetta can generate or refine models using physically realistic full-atom force field, which sometimes can generate more accurate comparative models. For some of those hard cases, Rosetta therefore is able to provide better initial search models for molecular replace to find the solutions.

ref: Qian et. al. High-resolution structure prediction and the crystallographic phase problem. Nature 450, 259-264 (2007)

#### CxxTest

This is the framework we use for unit tests. See also http://cxxtest.com.

#### database

The Rosetta database directory contains key parameters for Rosetta. Examples of stored information is force field, definition of monomers (see: residue types), representation of the model, fundamental constant parameters, etc.

#### ddG

Also known as ΔΔG. The change in binding energy free energy (ΔG) upon a mutation.

#### denovo modeling

Prediction of molecular structure given only its sequence. Known also as ab initio structure predition.

#### decoy

A model produced by a computational protocol.

#### density map

Experimental data showing where the electrons (and thus the atoms) are.

#### design

Optimization of the amino acid sequence of a protein.

#### devel

devel is one of the libraries within the Rosetta project. It contains code that is documented and tested but not necessarily scientifically validated to work well: code still under development. It is not availible in the released version.

#### dihedral angle

A four-body angle encoding the respective orientation of two atoms around the axis connecting two other atoms. Also known as a torsion.

#### disulfides

The covalent attachement of two cysteine residues in close proximity. This depends on the protein being present in an oxidizing environment (like outside of the cell), rather than a reducing environment (like the inside of the cell).

This covalent attachment can greatly stabilize the folding of a protein.

#### docking

Assembling two separate proteins (or protein-ligand, protein-surface) into their biologically relevant structure and finding the lower free energy of the complex.

[Explain further: blind docking, bound docking, unbound docking]

#### docking funnel

An energy funnel (score versus RMSD) for docking runs.

#### Dunbrack library

A sidechain rotamer library compiled by the Dunbrack laboratory; the standard rotamer library of Rosetta.

#### Energies

A class in Pose which stores the energies computed by the ScoreFunction.

#### energy function

Also called a "score function". The prediction of structural energy over which Rosetta operates.

#### energy funnel

An attempt at representing the energy landscape of the protein. A plot which (ideally) shows low rmsd structures having lower energies than high rmsd structures.

#### ensemble

a group of closely related structures

#### EnergyMethod

The class which implements the scoring of a particular score term for the ScoreFunction.

#### explicit water

Water modeled as atoms, rather than implicitly.

#### ex1/ex2

Options that specify the size (extra sampling) of rotamer library being used

#### fasta

Text based format describing the peptide sequence of a protein, single letter amino acid codes are used

#### filter

A pass/fail check on structure quality during the middle of a run. Filters are applied to avoid wasting computational time on trajectories which are unlikely to result in successful results. Relevant metrics are calculated and those structures with poor values are discarded.

#### fixbb

A Rosetta application which does fixed backbone design.

#### fixed backbone design

Design of a protein where the the backbone is not moved during the redesign.

#### fixed backbone packing

Optimization (packing) of the side chain conformations, done without moving the backbone.

#### flag file

Also options file: a file that contains a set of flags (possibly with their respective parameters) to control the program or protocol. You can load it as an option when you start the program, instead of typing all options at the command line.

#### flexible backbone design

Design of a protein where the the backbone is allowd to move during design.

#### flexible backbone packing

Optimization (packing) of the side chain conformations, where the backbone is allowed to move during optimization.

#### fold tree

A directed, acyclic graph (tree) connecting all the residues in the pose. The fold tree is the residue-level description of how internal coordinates and cartesian coordinates interconvert, and how changes propogate between residues. Changing the dihedral of one residue will change the cartesian coordinates of all residues "downstream" in the fold tree due to lever arm effects. By changing the fold tree you can limit the propigation of these effects, keeping portions of the protein backbone fixed which would normally move.

#### force field

See scorefunction.

#### fragment

A section of a protein. Typical Rosetta usage is for 3- and 9-mer backbone fragments selected from PDB structures.

#### fragment insertion

Placing backbone dihedrals from a fragment into the structure. Used frequently for loop modeling and ab initio.

#### fragment picker

A Rosetta application used to pick fragments.

#### fullatom

Also "all atom": A representation of the protein where all physical atoms (including hydrogens) are present during modeling, in contrast to reduced representations like centroid mode.

#### full-atom energy function

[This definition should be checked by someone else] The energy terms and interactions are calculated in the atomistic scale (atom-atom pairwise).

#### GDT

Global Distance Test. A metric used in CASP instead of RMSD, which is less sensitive to regions of unaligned structure.

[Insert reference here]

#### GDTMM

A Rosetta-specific name for GDT.

#### Git

The most widely used distributed version control system, used to control the Rosetta code. We use GitHub for hosting.

#### Gollum

Gollum (external link) is a Git-based wiki, used to create this wiki you are reading.

#### global minimum

The 3 dimensional conformation of a protein which corresponds to the lowest possible energy state, this is (usually) the conformation found in nature.

#### hard rep

Normal Lennard Jones repulsive - used in contrast to soft_rep.

#### heavy atom

All atoms except hydrogens.

#### homologue

Evolutionarily related proteins. They usually have similar structure and sequences, but don't necessarily have to. Within Rosetta however we are only interested in homologues that are similar in structure. Ones that are similar in sequence but not in structure are not necessarily useful, though proteins that share more than 20-25% of their sequence are usually structurally similar. (The 20-25% region is called the "twilight zone" of homology.)

A protein that is structurally similar but not evolutionarily related is an analogue.

#### homology modeling

Homology modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. Related to threading.

#### idealization

Rosetta normally works only with changing dihedral angles. The idealize application program loads the pdb file and replaces all bond lengths and plane angles with the values defined in Rosetta database. The result of this simulation is non-deterministic, so many runs may be attempted.

See also Cartesian minimization which works with non-ideal bond lengths and angles.

#### interaction graph

A representation of protein interactions during packing; can affect simulation speed

#### interface

The region of a structure where two chains interact

#### internal coordinates

Storage of the positions of atom based on bond lengths, angles and dihedrals, rather than Cartesian coordinates (xyz coordinates). The conversion between the two is kinematics.

#### jump

A portion of the fold tree representing a rigid body (non-covalent) movement.

#### knowledge-based potentials

Also "statistical potentials": energy function terms based on the probability of occurrence in a data set

#### Lennard-Jones

Also "LJ": A function that approximates the non-bonded interactions of neutral atoms, combines Pauli repulsion and the van der waals attractive term (also known as Lennard Jones 6-12 potential)

#### ligand

A molecule which binds a protein; for Rosetta this is specifically a non-polymeric small molecule

#### local minimum

The lowest energy 3 dimensional state of a protein in a neighborhood of similar conformations, there may be many local minimums of a protein, but only one global minimum.

#### loop

Structurally a loop region is a combination of phi-psi angles which is in a certain area of the Ramachandran plot. Loops are very loosely defined: a working definition is secondary structure that isn't defined as either an alpha helix or a beta sheet.

[XXX: A picture would be good here]

In Rosetta code a loop is anything between two fixed ends that you want to model. This usually corresponds to the structural definition of loops, but can also refer to regions which aren't.

#### low energy

A 3 dimensional model of a protein is low energy if it has good packing, satisfied polar or charged residues, appropriately placed small molecules or ligands, etc.

#### low Resolution

An experimentally determined structure of a protein is low resolution if atoms is not distinct, thypically this equates to a crystal structure resolution above 3-4 angstroms.

#### main chain

Used interchangeably with backbone atoms.

#### Metropolis criterion

Used by Monte Carlo methods, this equation tells whether to accept or reject a random move

#### MiniCON

This was the winter Rosetta developer's meeting, which moved around the country to be hosted by different RosettaCommons labs. We discussed code issues of wide interest and narrow. The name has changed to Winter RosettaCON.

#### minimization

Optimize the protein structure by making small movements to lower energy conformations

#### minirosetta

The name of Rosetta3 project during initial development.

Also, the name of a wrapper program which exposes multiple protocols, mainly used for Rosetta@Home.

See MiniCON.

#### mmCIF

Macromolecular Crystallographic Information File, file format used to describe the 3 dimensional structure of a protein

#### model

A representation of the 3 dimensional structure of a protein

"All models are wrong, but some are useful" - George Box

#### MOL format

A file type that contains information about the structure of a chemical; same as "SDF format"

#### MolProbity

MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components.

MolProbity: all-atom contacts and structure validation for proteins and nucleic acids, Davis et al., Nucleic Acids Res. 2007 July; 35(Web Server issue): W375–W383.

#### Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. In Rosetta, Monte Carlo methods are used frequently to sample: rotamers in repacking, amino acids in design, fragments in folding, numerous other chemical changes.

(Discuss specifics of Monte Carlo in Rosetta)

#### MoveMap

A class in Rosetta which contains lists of mobile and immobile degrees of freedom. Normally used during minimization to specify which parts of the Pose can be minimized. (e.g. for fixed backbone minimization)

#### Mover

An abstract class and parent of all protocols. Every protocol in Rosetta has to inherit from this class and implement the apply function, which then alters the Pose and implements the protocol.

#### native structure

The structure of a protein, ligand, etc that is found in nature, usually refers to the crystal or NMR structure of a protein

#### NNMAKE

An earlier version of the fragment picker application.

#### nstruct

The number of models that Rosetta will output

#### options

User specified directions given to Rosetta, either through the command line or through the options file, sometimes called "flags"

A class which sets up what is allowed in packing.

#### packing

In Rosetta, optimizing the conformation (and identity) of protein sidechains. The Rosetta Packer uses Metropolis Monte Carlo Simulated Annealing to optimize rotamers.

#### packing density

How close atoms are to each other; closer is better, up to a point

#### params file

A file which tells Rosetta how a residue behaves.

#### Parser

Another name for RosettaScripts

#### patch files

A file which makes a small adjustment to a score function

#### PDB

Can refer to either the Protein Data Bank, a website that contains structural information of proteins, usually determined by x-ray crystallography or NMR. Or PDB can refer to the file type used by the protein data bank to represent the 3 dimensional structure of a protein

#### phi

The dihedral angle describing the position of the C-N-Calpha-C atoms

#### pilot apps

Rosetta applications written by the community that have not been yet officially released.

#### Pose

Represents a molecular structure in Rosetta (of proteins, RNA, etc) and contains all of its properties such as Energies, FoldTree, Conformation** and more. Each and every Mover in Rosetta operates on a pose through its apply function.

#### protocol

Workflow to do specific calculations in Rosetta; sometimes a protocol uses movers.

#### psi

The dihedral angle describing the position of the N-Calpha-C-N atoms

#### ReferenceCount

ReferenceCount was the core class in the smart pointer system that Rosetta3 used up until 2015. Nearly every class in Rosetta ultimately inherits from this class. The class remains as an empty class, because it was too hard to move after Luki Goldschmidt's transition to our newer smart pointers, and because having a base class for nearly all classes is useful for the Pose DataCache.

#### refinement

Starting from a low-resolution model, use the full-atom energy function to modify the conformation so it is closer to an experimentally determined structure.

#### relax

A protocol in Rosetta which optimizes the structure of the protein

#### release

The Releases are when we make Rosetta code available to academic and industrial users. The code in trunk is copied into a branch in git, cleaned up to remove unreleaseable code (usually devel and pilot_apps, then posted for wider use. We are currently on a "weekly release" schedule, where a new release is produced more-or-less each week. (It is not every week, as certain weeks the code does not pass our quality control measures.)

#### repack

Determine the conformation of sidechains which minimizes the energy

#### representation

How Rosetta sees a protein molecule. Rosetta supports two representation:

1. fullatom - full atom representation, slow but accurate.
2. centroid - a reduced representation. faster, but less precise.

#### repulsive term

fa_rep: The part of the Lennard Jones equation which describes the effects of overlapping electron orbitals, the energy will be positive

#### resfile

The resfile is a file format used to manually pass complex instructions to the packer / PackerTask.

#### residue

Each Pose/Conformation is broken down into small units called "Residues", which could be amino acids, nucleic acids or any group atoms with certain rules of what they are and how they are connected, such as a small chemical ligand moiety. The chemical content of a Residue is stored in an object called "ResidueType" and aside from that each Residue has other data storing actual coordinate information of each atom it contains as well as coordinate-related data such as mainchain/sidechain torsion angles, sequence position etc. For example, in a protein there might be multiple Leucine residues, each of which will be an individual "Residue" object. Each Leu Residue has its own coordinate data, but all Leu will have the same Leu ResidueType which contains information on what are the atoms, their names, chemical elements and connectivity. This setup also allows a sidechain Rotamer to be represented just as a Residue.

#### residue types

A set of atoms defined for each residue known to Rosetta. The set defines also bonds and local geometry. The data are stored in database). Each kind of residue normally has distinct ResidueType objects for each of the different Rosetta representation.

#### Residue

A class in Rosetta which stores the coordinates and details about a specific residue in a Pose.

#### ResidueType

A class in Rosetta specifying how a particular residue behaves chemically. It does not contain the coordinates of the residue (that is stored in a Residue object), but rather things like chemical connectivity and atom properties.

#### ResidueTypeSet

A class containing a collection of ResidueTypes all of the same type. The standard ResidueTypeSets are centroid and fullatom.

#### restraints

Adjustments to the energy function; often called "constraints" in Rosetta

#### REU

Rosetta Energy Units - Rosetta's arbitrary energy term, does not correspond with physical energy measurements

#### rigid-body

There is no intramolecular flexibility between the protein backbone atoms or bonds and angles are frozen for the backbone.

#### Rohl review

This term refers to Rohl et. al., 2004, Protein structure prediction using Rosetta, the earliest review paper of Rosetta. See its entry in the Rosetta Canon.

#### root mean square deviation (RMSD)

[It's not enough to just explain how RMSD is calculated: it's also important to discuss what significance it plays in Rosetta, and what values are to be expected calculating it in various places.]

#### Robetta

An online, automated tool for protein structure prediction and analysis.

#### Rosetta

Best software ever? Or merely the easiest to use? You decide!

#### Rosetta++

Rosetta++ was the 2.x edition of Rosetta. It is so-named because it was in C++, as a human-assisted machine translation of the original FORTRAN Rosetta.

#### Rosetta3 paper

ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules, which was the paper that described the transition from C++-but-monolithic Rosetta++ to object-oriented-C++ Rosetta3.

#### RosettaCommons

This is the organization that manages the intellectual property of the Rosetta code.

#### RosettaCON

This is a summer convention held every year, usually around the last week of July-first week of August, usually at the Sleeping Lady in Leavenworth, WA. It's a scientific conference just for Rosetta developers and users in industry or RosettaCommons labs, along with a few invited speakers.

#### Rosetta Developer's Meeting

This is a one-day addendum to RosettaCON, usually held at the University of Washington in Seattle the day before RosettaCON. It's used to handle Rosetta code issues of wide interest that are too technical for the RosettaCON audience.

#### RosettaScripts

An XML based interface for controlling Rosetta, allows the user greater control of methods, score functions, etc, without requiring the user to change the source code of Rosetta.

#### rotamer

Rotamers, rotameric isomers, represent the most stable sidechain configurations, which are commonly observed in crystal structures. Using rotamers allows Rosetta to efficiently consider many discrete side chain conformations, where continuous side chain motion would be expensive.

#### rotamer trial minimization

the optimal combination of rotamers (sidechains) is found using a simulated-annealing Monte Carlo search. Minimization techiniques are adopted afterwards to optimize sidechains and rigibody displacements simulataneously.

#### SASA

Solvent accessible surface area – the area of a protein that can be reached by water or another solvent

#### scorefile

A flat-text file produced by Rosetta applications that contain all energy component values. Each row provides values for a single pose (structure). An equivalent file is can be also made from a silent file by the following grep command:

grep SCORE silentfile.out > scorefile.sc


#### ScoreFunction

The class in Rosetta which handles scoring the pose. A particular Rosetta run can use multiple different ScoreFunctions, each with their own weights files and settings.

#### scoring grid

A rapid pre-calculation of scoring for ligand docking

#### secondary structure

Secondary structures describe classes of local conformations of a molecule (usually a nucleic acid or protein). The most basic formulation of protein secondary structure classes are alpha helices, beta sheets, and loops.

In ab-initio projects, Rosetta uses secondary structure prediction programs to predict the secondary structure of the target protein. The predicted secondary structure is then used to select fragments from the Vall database of fragments.

Secondary structure prediction in Rosetta is currently achieved by a combination of the following three methods:

• PsiPred. D.T. Jones, J. Mol. Biol. 292, 195 (1999).
• SAM-T99. K. Karplus, R. Karchin, C. Barrett, S. Tu, M. Cline, M. Diekhans, L. Grate, J. Casper, R. Hughey, Protein Struct. Funct. Genet. S5, 86 (2001).
• JUFO. J. Meileer, M. Muller, A. Zeidler, F. Schmaschke, J. Mol. Biol. 7, 360 (2001).

#### sequence

Peptide sequence or amino acid sequence is the order in which amino acid residues, connected by peptide bonds, lie in the chain in peptides and proteins. The sequence is generally reported from the N-terminal end containing free amino group to the C-terminal end containing free carboxyl group. Peptide sequence is often called protein sequence if it represents the primary structure of a protein. In Rosetta, a sequence is input in a *.fasta file format.

#### SDF format

A file format that describes the structure and connectivity of a molecule, used primarily for small molecules, not for proteins; also known as MOL format

#### Shultzy's

Shultzy's is a favorite bar and sausage grill of the Rosetta community when in Seattle for RosettaCON. It's on the east side of The Ave.

#### side chain

The 20 aminoacids contain an amino group (NH2), a carboxylic acid group (COOH), and any of various sideChains R, and have the basic formula NH2-CH-COOH(R)

#### silent file

A flat-text file that stores poses (structures) computed with Rosetta along with the relevant scores (energies). By default, the file name is default.out but it may be changed with -out::silent flag.

The file contains only internal degrees of freedom of a pose (Phi, Psi, omega and Chi angles). Cartesian coordinates must be restored with extract_pdbs application.

#### solvent accessible surface area (SASA)

Solvent Accessible Surface Area

#### small molecule

For Rosetta, anything that's not a polymeric biomacromolecule

#### soft_rep

An energy function where the Lennard Jones potential is adjusted so that clashes aren't scored as badly; contrast "hard_rep"

#### ss2

File format used to store secondary structure information. Originally introduced by PsiPred program (by D. Jones)

#### symmetry definitions

symdef files tell Rosetta how to treat a symmetric protein

#### target sequence

The sequence of the protein of unknown structure you're trying to model

A class to set up new PackerTasks as needed, by applying a number of TaskOperations.

A specification in RosettaScripts which tell the Packer how to optimize rotamers

#### test servers

The Gray lab maintains a testing server which runs a set of standardized tests on each commit of the code to trunk. The tests ensure that:

• the code compiles
• the unit tests pass
• the integration tests are correct
• and many other things

#### Thai Tom

Thai Tom is one of the two 'Rosetta restaurants' that many developers like to visit in Seattle before/after RosettaCON. 4543 University Way NE, Seattle, WA 98105 (it's on the west side of The Ave). Excellent Thai food, can be very spicy. Wait times are a problem if 40 Rosetta people show up at once.

#### theozyme

A theozyme, or "theoretical enzyme," is a convention used from enzyme design. Unsurprisingly, it's a good idea to generate a geometrically idea active site to stabilize the desired transition state conformation; once you set that up, you can thread it onto a pose.

Protein threading, also known as fold recognition, is a method of protein modeling (i.e. computational protein structure prediction) which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. Threading is the process of placing the amino acids of a target protein onto the 3D structure of a template according to a sequence alignment. A comparative model can then be build of the target protein sequence.

#### Top7 / Top7 paper

Top7 is the name of a protein de novo designed with Rosetta. Its paper, Design of a novel globular protein fold with atomic-level accuracy, is also of broad interest for its description of the early energy function.

#### torsion angle

aka dihedral; the degree of freedom of rotating around a bond

#### torsion space

Internal coordinates; torsion space minimization optimizes the protein by rotating dihedrals

#### trunk

trunk is a name for where the developers' current version of Rosetta lives. It's called trunk because it's the main line of the code; side development projects are in branches. Also known as master.

#### unbound docking

the crystal PDB structures of the 2 proteins are determined separately and then combined into one complex

#### Vall

Pronounced "V-all". The Vall database is a condensed representation of the entire PDB for the purpose of fragment picking. The fragment picker filters the Vall database based on the sequence and secondary structure predictions (and other information) to pull out those backbone conformations which represent the desired fragments.

#### van der Waals

Describes the interactions between neutral, non-bonded atoms, in protein prediction often used interchangeably with Lennard-Jones potential

#### weights file

The file which specifies the coefficients to use when linearly combining score terms into a scoring function.

#### Winter RosettaCON

This is the winter Rosetta developer's meeting, which moves around the country to be hosted by different RosettaCommons labs. We discuss code issues of wide interest and narrow.

#### XML

A hierachical data format, a custom version is used by RosettaScripts