I am trying to dock a D- peptide on a protein dimer using the FlexPepDocking application. After collecting all the information I could find from tutorials and old posts I followed the steps described below:
1. I uncommented all the lines under "D-CAA TYPES" in file: rosetta_database/chemical/residue_type_sets/fa_standard/residue_types.txt
2. Downloaded the rotamer libraries from http://carl.bio.nyu.edu/%7Erenfrew/ncaa/ncaa_rotamer_libraries.tar.gz and placed the files inside
rosetta_database/rotamer/ncaa_rotlibs. I did not modify the residue parameter files (*.params) since I noticed that they already included "NCAA_ROTLIB_PATH" and "NCAA_ROTLIB_NUM_ROTAMER_BINS" directives at the bottom.
3. Placed the D-peptide near the binding cavity, assigned chain A to the receptor dimer and chain B to the D-peptide and named the residues of the D-peptide as DILE, DGLN, etc. Here 's a line from the .pdb:
ATOM 1 N DILE B 1 24.948 37.276 -1.105 1.00 0.00 N
4. Added the flag "-score::weights mm_std" in the pre-pack and refinement steps to use the pure molecular mechanics scoring function.
Below are the prepack flags:
And the refinement flags:
Both stages were completed successfuly, yet I am not sure if Rosetta used the D-CAA rotamers and parameter files. Does anyone know how I can find this out? I have also attached the output.
Is there anything wrong with my workflow? Any alternative scoring function I could use?
I would appreciate your comments. Thanks in advance.
Residue names in PDB files are exactly three characters long. You need to use the three letter string specified by the IO_STRING line of the params file, not the name of the params file itself.
For example, rosetta_database/chemical/residue_type_sets/fa_standard/d-caa/DILE.params has an IO_STRING line of
"IO_STRING DIL X", so you'll want to use the three letter code "DIL" to specify d-isoleucine:
ATOM 1 N DIL B 1 24.948 37.276 -1.105 1.00 0.00 N
Keep in mind, too, that PDB lines are laid out by (charachter) columns, so all the items need to line up. (e.g. all the decimal points need to be in the same place, and the same place as they would be for a PDB downloaded from the RCSB.)
To check that Rosetta correctly read the residue, you want to examine the output PDB, and make sure the residue type name and the atom names in the output match what you expect them too. (Also glance through the tracer log, and make sure that there's no warning lines about your residue - either by position or by name.)
"Both stages were completed successfuly, yet I am not sure if Rosetta used the D-CAA rotamers and parameter files. Does anyone know how I can find this out? I have also attached the output."
You should be able to see clearly in the output structures whether the peptide has D or L amino acids.
I agree with Rocco that you probably need "DIL" not "DILE" to get Rosetta to read them in.
Yes I did verify that my output models mentain the D-amino acid stereochemistry.
Nice! Now that I fixed the residue names I get in the output:
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dile.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dgln.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dasp.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dleu.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/darg.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dlys.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dphe.rotlib...done!
Reading in rot lib /home/thomas/Programs/Rosetta3.4/rosetta_database//rotamer/ncaa_rotlibs/dglu.rotlib...done!
It seems that Rosetta is reading the correct rotamer libraries now.
Is there any alternative scoring function I could use? If I remember correctly in the publication P. Douglas Renfrew, Eun Jung Choi, Brian Kuhlman, "Using Noncanonical Amino Acids in Computational Protein-Peptide Interface Design" (2011) PLoS One, the authors have derived a scoring function for NCAA. Has that scoring function been tested for D-amino acids too?
The mm scorefunction works on mixtures of noncanonical and regular amino acids; it should work equally well on D.
Theoretically, you could use all the normal knowledge-based terms, if they were reversed for D instead of L where necessary, but I don't think Doug ever did so. It would not be easy.
The mm_std scoring function was tested with both L and D CAAs and NCAAs primarily in both a homo-chiral context (L-NCAA peptide + L-CAA protein) and hetero-chiral context but the hetero-chirality was at the chain level (peptide with D-NCAAs and D-CAAs + L-CAA protein, these results were not published). I have not explicitly tested a pose with mixed L and D amino acids in the same chain but it should be able to score poses like that and I suspect that it would behave well.
It is being currently used on a few different peptidomimetics/foldamers with very good results in some cases, so I feel it is pretty general.
Thanks! If I need to do ab initio flexible peptide docking how can I create fragment libraries for D-amino acids?
Make fragments for L and mirror-reverse them? I guess since fragments only use backbone torsions that will work? I can't remember how the secondarily-chiral beta-branched residues are L/D related anyway. You'd probably need to write a fair bit of your own code to do this...I'm not sure how one does mirror-reversal in backbone torsion space.
If you multiple ALL the torsion angles by -1 in an internal-coordinate system, you should get a mirror image isomer. This will also invert the chiral centers of ILE and THR. There are L and D versions of allo-THR and allo-ILE in the NCAA set.
I tried to enamble the -lowres_preoptimize flag to increase sampling during refinement (without ab initio) but I got the following error:
core.scoring.ScoreFunctionFactory: (0) SCOREFUNCTION: mm_std
core.scoring.methods.UnfoldedStateEnergy: (0) instantiating class with weights: fa_atr: 0.8 fa_rep: 0.634454 fa_sol: 1.16497 mm_lj_intra_rep: 0.324341 mm_lj_intra_atr: 0.537815 mm_twist: 0.2662 pro_close: 1.44777 hbond_sr_bb: 0.656728 hbond_lr_bb: 1.50186 hbond_bb_sc: 1.45367 hbond_sc: 1.18477
core.chemical.ResidueTypeSet: (0) Finished initializing centroid residue type set. Created 1980 residue types
can not find a residue type that matches the residue DILE_p:NtermProteinFullat position 163
ERROR: core::util::switch_to_residue_type_set fails
ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143
The following line matched "DILE_P" and is in the prepacked .pdb file:
DILE_p:NtermProteinFull_163 -3.89009 7.67639 5.48479 2.14729 -1.52152 1.99291 0 0 0 0 0 0 0 0 0 -1.3936 10.4962
I have attached the full output from refinement. Any idea how to fix it? Recall that I use the -score::weights mm_std due to my D-amino acids.
You've only got L centroid residues in your database, not D centroids. I guess you can convert your D fullatoms to D centroids by comparing them to L fullatom/centroid of the same residues? You shouldn't need to worry about the NtermProteinFull part - that patch will apply automatically once the underlying centroid DILE exists.
Could you please be more specific. Where are these files? Do I need to convert them using my chemical intuition or just copy them somewhere?
Sure! I assumed you were familiar with working with the database files since you got noncanonicals working at all.
Your D fullatom stuff is in rosetta_database/chemical/residue_type_sets/fa_standard/residue_types/d-caa. If you look in the centroid parameter file set, which is rosetta_database/chemical/residue_type_sets/centroid/residue_types, you'll find comparatively few residue types at all, and none for D amino acids (only the Ls).
Rosetta is probably failing because it can't find here the centroid D amino acids it needs to run centroid modeling on your D amino acid peptide.
Compare the L fullatom/centroid parameter files. You'll find (I hope; I haven't checked) that for the backbone atoms they're identical, for CB they're identical, and for the centroid file all atoms past CB have been removed and replaced with a "CEN" atom (usually defined at the bottom of the file).
To get D amino acid centroids, you need to do the same thing, starting from the D files. The CEN atom size/distance parameters for the Ls will work for the Ds. I think the whole ICOOR_INTERNAL line won't, due to the centroid now pointing off in a different direction (probably a different torsion, which I think is the first numeric column). You may be able to use the centroid option of molfile_to_params.py to do this as well. If they work, we can port them back into the main releases if you wish.
There is no need to derive the ICOOR_INTERNAL lines from the L-amino acids cause I already have .params files for full atom D-amino acids. So after looking carefully the differences between ASN centroid (ASN.cen_.params.txt in the attachments) and fa .params (ASN.fa_.params.txt) I created a centroid .params file for DASN (editted_DASN.cen_.params.txt) from the fa file (DASN.fa_.params.txt). My only problem is the last 3 lines saying (in the L-ASN):
ATOM CEN CEN_ASN H 0.0
BOND CA CEN
ICOOR_INTERNAL CEN -141.599884 68.787140 2.403841 CA N C
I guess the first two lines will remain the same, although I am not sure if I must replace CEN_ASN with CEN_DAN. Yet, I don't know how to edit the last line, I can't find something equivalent in the fa .params file for D-ASN neither in the centroid file I created with molfile_to_params.py.
The "CEN" atom is centroid atom - basically, modeling the sidechain as a single atom, rather than multiple ones.
The "CEN_ASN" is the atom type, and you'll probably want to leave that as is, as the d-Asn sidechain will have similar properties to the l-Asn sidechain. (In particular, changing it to "CEN_DAN" *won't* work, because "CEN_DAN" isn't an atom type Rosetta recognizes - you'd have to come up with the relevant parameters for it and add it.)
Regarding the ICOORD line, this specifies the geometry of where to put the CEN atom. The "CA N C" bit means put it a given distance from CA, with a given angle for CEN-CA-N and a given CEN-CA-N-C dihedral. The numbers, as you might guess, are the values for the dihedral, angle and distance, respectively. I would suggest leaving the distance and the angle the same, and simply inverting the sign of the dihedral to flip the chirality.
Thanks, very clear explanation! I created centroid .params files for ten D-amino acids (attached) and passed them to FlexPepDocking with the -extra_res_cen option. Unfortunately the -lowres_preoptimize flag moves the peptide far from the binding site thus adopting unrealistic poses. I will soon attach the other 10 D-amino acids to port them in the main release. I hope they are all correct.