KEYWORDS: CORE_CONCEPTS LIGANDS
Tutorial by Parisa Hosseinzadeh (email@example.com), prepared in June 2016.
Rosetta is residue-based. That is, Rosetta doesn't treat atoms individually, but rather as a complete chemical entity. For this to work, bond lengths, bond angles, and other properties of the residue must be known to Rosetta. This is true for both the standard amino acid residues as well as other small molecules and residues it models. Rosetta encodes the properties of a residue in params files. The Rosetta database contains params files for the common amino acids, as well as for a selection of other small molecules. In order to add a new ligand - one Rosetta does not recognize - you will need to generate a new params file.
A params file is where the pre-computed information about the geometry and chemical features of residues and ligands are stored. In order for Rosetta to understand how to treat the residue during any run, there should be a params file associated with that residue. You can find examples of params file for many types of molecules such as 20 canonical amino acids, some non-canonical amino acids, water, metal ions, nucleic acids, etc. in Rosetta/main/database/chemical/residue_type_sets/fa_standard/residue_types. More detailed information about what each line means can be found in the Rosetta documentation. In this tutorial, we go over how to prepare a ligand a generate a params file for it.
Let's start from obtaining the ligand. The ligand we are working with today is going to be "serotonin", the molecule of happiness! There are many ways to download the PDB structure of a ligand. The easiest way is through the PDB site itself. Go to http://www.rcsb.org/pdb/home/home.do and choose search -> Advanced Search. Start advanced search. In choose a query type, scroll down to chemical components and choose chemical name. Type Serotonin and submit query. You can see that some options will show up. Click on "SRO" with formula C10 H12 N2 O. You can see that "SRO is found in 9 entries". Click on 9 entries. It will take you to a new page, showing all the PDB entries with serotonin in them.
Click on Download Files tab on the top. Scroll down to Download Ligands. In Download Options choose : "Single SDF file". In Instance Type choose: "All instances from each PDB entry". Leave the rest of the filters at their default values. Make sure you only chose SRO in the ligand box. Then Launch the Download Application. A file named "Ligands_noHydrogens_noMissing_27_Instances.sdf" or something similar should be downloaded. Please refer to More Advanced Preparation for information on how to generate conformers from a single PDB.
Navigate to the prepare_ligand directory running this command:
> cd <path-to-Rosetta>/Rosetta/demos/tutorials/prepare_ligand/inputs
We have already provided you with the same file. You can open them and compare them, but it is renamed to serotonin.sdf. You can rename the file running this command:
> mv <file_name>.sdf serotonin.sdf
Now, at this step, we need to add hydrogens to this molecule. There are multiple ways to achieve this goal. If installed on your computer, you can use babel to add hydrogens with the GUI interface or using command line:
> babel -h serotonin.sdf serotonin_withH.sdf
You can also use Avogadro and use it to add hydrogens and minimize the PDB.
We have provided a serotonin_withH.sdf file for you in the inputs directory if you cannot get it.
Now we are at the stage of generating the params file. Rosetta provides a python script that will take an sdf file and generate a params file from it.
cd ../ to get out of the inputs directory and run this command:
$> ../../../main/source/scripts/python/public/molfile_to_params.py -n SRO -p SRO --conformers-in-one-file inputs/serotonin_withH.sdf
-n is a option that says what is the three letter name you want to give to the ligand and -p defines the name given to the params file. After you are done with this command you should have a file named SRO.params like the one provided in the outputs directory. Also, 27 PDB files (or any number you saw during download) are generated based on each conformer found from PDB.
Note1. To learn better what the molfile_to_params.py works and what are the other options, you can run:
> ../../../main/source/scripts/python/public/molfile_to_params.py --help
Note2. If you have multiple files that you want to generate the params file for, you can run a batch molfile_to_params command as follows:
> <path-to-Rosetta>/Rosetta/main/source/scripts/python/public/batch_molfile_to_params.py --script_path=<path-to-molfile_to_params.py> list_of_molfiles.txt
Now let's take a look at the generated params file:
NAME SRO IO_STRING SRO Z TYPE LIGAND AA UNK
These lines are basic information about the ligand. We are telling Rosetta that this residue is a
LIGAND, its name is
SRO and it is shown by character
Z in one letter code. This residue was
UNKown to Rosetta prior to this.
ATOM C6 aroC X -0.10 ATOM C5 aroC X -0.10 ATOM N1 Ntrp X -0.59 ATOM C4 aroC X -0.10 ATOM C3 aroC X -0.10 ATOM C2 aroC X -0.10 ATOM C1 aroC X -0.10 ATOM O1 OH X -0.64 ATOM H1 Hpol X 0.45 ATOM C8 aroC X -0.10 ATOM C7 aroC X -0.10 [...]
These lines show the atoms in the residue. The first column are the atom naming that are going to be used in the PDB. The second column shows the Rosetta atom types for each atom. For example,
Hpol means a polar hydrogen that can be involved in hydrogen bonding interactions whereas
Haro is a hydrogen that is attached to an aromatic ring. The last column shows partial charges. Go through these columns carefully and check whether the definitions are correct. If you have calculated partial charges from some molecular dynamics sources, change these with those. Make sure atom types are correctly defined.
Next lines start with
BOND_TYPE and define what atoms are connected to each other and whether this is a single bond or a double bond.
CHI column defines a rotatable chi angle. A chi angle number is specified, followed by the names of the four atoms defining the angle.
PROTON_CHI 1 shows how much sampling the proton in that chi is going to go through.
Next two lines are
NBR_ATOM C6 NBR_RADIUS 5.878178
NBR_ATOM is "neighbor atom" in Rosetta. For a ligand, it is the closest atom to geometric center of mass by default.
NBR_RADIUS is a measure of the size of the ligand. It's an estimate of the furthest possible distance from the neighbor atom to any other heavy atom in the residue.
The final lines are
ICOOR_INTERNAL, and define the internal coordinates of the atoms (see here for more information.
At the end of your file, add this line:
This line tells Rosetta where the conformers are stored.
Now that you generated the params file, you should inform Rosetta where it is. This can be done by adding this option to the your command line:
Often times, you need to generate conformers for a ligand (i.e. different conformations a ligand can take that are energetically favorable). Rosetta cannot do this, but there are different software packages that can perform this function. After the conformers are generated, the rest of the process is the same and you can continue from Generating the Params File.
Rosetta can also be used with D-amino acids, non-canonical amino acids, non-canonical backbones, RNA and metal ions. Here is documentation describing how to prepare them.