Rosetta has support for interacting with SQLite3, MySQL and PostgreSQL database backends. This page describes the backends, how to get started using them, and what has already been done. The SQLite3 backend is tested extensively in the integration_tests with every commit, and the PostgreSQL and MySQL support is tested through the BuildBot framework ( PostgreSQL , MySQL).
To build Rosetta with MySQL support:
libmysqlclient_r.sointo your path by creating a
You will also need to install
libmysqlclient in the
LD_LIBRARY_PATH environment variable (and at runtime).
library_pathdictionary, add a string containing the path to the directory containing the libmysql client libraries (e.g.
Symlink or place mysql.h (and other required headers) into
Ubuntu-specific (and maybe other Linux) instructions to build Rosetta with MySQL support:
libmysqlclient by installing the
libmysqlclient-dev client from your distro's package manager (e.g.
apt-get install libmysqlclient-dev )
Symlink header files into
external/dbio/mysql by running
ln -s /usr/include/mysql/* .
libpqand install it in the
LD_LIBRARY_PATHenvironment variable (note: make sure to use the same client library version as the database server).
database connection information can be specified with these RosettaScripts and or command line options .
For many applications, one would like to store and retrieve information about a set of structures, for example maybe its relevant to store the atomic coordinates, how similar each structure is to the native and the predicted binding energy (maybe the project is protein interface design, and the set consists of all the structures in various rounds of prediction). We have developed a modular database schema , where each
FeaturesReporter is responsible for a set of tables in the database. Using a particular schema, features for a set of structures is stored as a batch in the database.
Structures can be read from, and written to a relational database when using the JD2 job distributor. Advantages over pdb files or silent files include:
Database IO is implemented simply as a fixed set of FeaturesReporters:
Whole Structure Features:
Per Residue Features
Possible issues for cluster based jobs:
Databases can be merged because the features have composite primary keys that includes the structure primary key, struct_id , that at least partially randomized. to merge sqlite3 database consider using the merge script in main/tests/features/sample_sources/merge.sh.
Rosetta can input poses from a database, and output poses to a database. Support for this behavior is supported in any application which utilizes the JD2 job distributor. The DatabaseJobOutputter is compatible with both serial and parallel jobs, and automatically detects non-ideal poses and properly handles output.
Multiple executions of Rosetta can be stored in the same database. Each execution will have a separate protocol_id. If -out:database_protocol_id is not specified, the protocol_id field auto-increments. The Rosetta SVN version, command line, XML script (if available) and flags are stored in the database.
Poses can be extracted from the database into PDB or Silent files using the application score_jd2. MySQL and sqlite3 interfaces are also available for perl, python, R and other scripting languages, making it possible to directly parse and analyze data without extracting it. Poses can be extracted from a database in code using protocols::features::ProteinSilentReport::load_pose() function.
Database filters allow you to only output poses that meet some criteria based on the existing poses in the database. Database filters are invoked from the command line with the following syntax:
-out:database_filter <database filter name> <list of database filter options>
At present 4 database filters are implemented: