This document was written by Steven Lewis.
Rosetta is an object-oriented library project, it is relatively straightforward to write your own protocols and executeables. This document will highlight some of the tools you need to do so, and how you go about doing it.
First, if you're going to write new Rosetta protocols, you need to know C++. You also need to be familiar with object-oriented design and know how classes, virtual inheritance, etc work.
Rosetta is separated into multiple libraries, and within the libraries the code is separated into hundreds of separate files. For new application development, you will probably need to be writing code in the application and possibly devel layers. The utility
libraries provide nonspecific utility
functions (math functions, etc). The core
library provides the classes and functions for the integral shared parts of Rosetta (poses, scoring, the packer, as well as the options and tracer system (see tracers below)) and the protocol library provides mostly everything else (the protocols like abinitio or docking, the movers (see below)). The devel library is like protocols but for new development, and the applications layer is not a library but just holds the executeables (files with an int main() function).
So, to start your own application, carve out some space in the applications layer in src/apps/pilot/yourname. To get SCons to recognize the new file, modify src/pilot_apps.src.settings.all (or pilot_apps.src.settings.my) to include your new executeables. The syntax of that file is Python; just copy the syntax from other executeables. When you next compile, SCons will look for your new file and attempt to compile it into an executeable. The binary will land deep in the build folder and a link to it should appear in the bin folder. The executeable name will be the same as the new file you added.
If your application is particularly large and complex, or you like compartmentalized code, make some space in the devel folder and repeat the adding files process for src/devel.src.settings and/or src/devel.src.settings.my. Whatever code you put here will be compiled as part of the devel library: in other words the classes and functions will compile into callable units but you can't have executeables here.
We have tried to keep the physical location of files and their namespacing synchronized. This means that if you see a class named "core::pack::task::PackerTask", you know the code for it will live in core/pack/task somewhere.
When developing a protocol, you will be compiling very frequently - not just one "scons bin mode=release" after you download the code, but several times an hour as you work to check that your new code's syntax is correct. SCons is smart enough to only recompile the code that requires it (code that's new or changed). You may want to figure out the "my" build commands (documented with the build system). If you are changing stuff in the libraries the my commands can streamline compiling by recompiling only the binaries you want.
You will want to familiarize yourself with the idea of "debug" builds (remove mode=release from the command line); code built this way can be fed into standard debuggers like GDB to figure out problems.
The most important class in Rosetta is the Pose (src/core/pose/Pose.hh). A Pose is basically a digital representation of a protein, although the class is compatible with DNA, RNA, ligands, etc. A Pose contains a Conformation object (src/core/conformation/Conformation.hh) which contains information about what atoms there are, and their placement by internal and 3-D coordinates. A Pose also contains an Energies (core/scoring/Energies.hh) object which remembers the energies assigned to the pose the last time it was scored. Pretty much everything you'll want to do involves modifying a pose and then re-scoring it to check what the change did.
- Mover Most of the work in Rosetta is done by Movers. Mover (protocols/moves/Mover.hh) is the interface class for a very large hierarchy, collectively referred to as the "movers". Most of them, including the generic widely used ones, are in protocols/moves/. Movers found elsewhere in the protocols or devel libraries are usually more complex, protocol-specific classes. A Mover has an function of the following form:
void apply( core::pose::Pose & pose )
This is the apply function. Calling a Mover's apply function causes it to do whatever it does to the Pose (packing its sidechains, remodeling the backbone, converting it from fullatom to centroid mode, whatever). If you want to know what a mover does, look at its apply function. Movers usually have complex constructors and setters determining exactly what they do (for example, a mover that packs will need a PackerTask to tell the packer what to do). Remember that Movers can create and call other movers, meaning that an entire protocol can ultimately be crammed into one single Mover that calls many others. Also remember that a Mover is expected to take a Pose at apply(), not at construction.
- dist Job Distribution You'll always want sampling with Rosetta (running the protocol many times to get a series of results). The Job Distributor is responsible for helping with this. If your protocol is packaged into a Mover, you can just hand the mover off to the Job Distributor and it will run the mover arbitrarily many times against some input pose and number the outputs.
Rosetta has a few features (rules, really) that are slightly out of sync with standard C++ usage, as well as some extremely important workhorse classes.
C++ usually indexes by zero (meaning that a std::vector with 8 elements numbers them 0 to 7). Rosetta is usually interested in numbering residues within protein, or atoms within a residue: cases where 0 is nonsense. So, most of Rosetta functions on 1-based indexing instead of 0-based via the utility::vector1 class. This class's operator takes arguments starting at 1 up through in for an n-element container. It also provides assert() statements in debug mode to check for vector over/underruns.
C++ provides a small number of output channels: std::cout and std::cerr. Rosetta implements an extensive Tracer system (core/util/tracer) which provides multiple cout-style channels. Each Tracer has a string name associated with it, and this name is printed at the left of each line as the output is printed. This provides two key benefits. First, each line of output is labeled with what part of the code it came from. Second, you can use the mute command line option to turn off any Tracers whose output does not interest you. Tracers are traditionally named like the namespacing, so a Tracer named "core.pack.task.PackerTask" is reporting things from inside the PackerTask class.
C++ provides raw pointers, but the Rosetta community decided they were too messy and dangerous. Instead, pointers have been replaced with "owning pointers" wherever possible. If you see a typename that ends in OP or COP, it means that the object is a pointer (or C for constant) to an object of the underlying type. OPs function like pointers as far as creation and dereferencing is concerned; the big change is that the OP class prevents the underlying memory from being deallocated until all copies of the OP have been destructed. Pretend these are raw pointers and you'll be fine, just don't try to use the delete keyword on them. Using the new keyword when constructing the underlying object is fine:
MoverOP mymover = new SuperDuperMover( some stuff );