# Can't make NCAA's and D-aminoacids work

27 posts / 0 new
Can't make NCAA's and D-aminoacids work
#1

Hi,

I'm trying to incorporate D-amino acids in some designs.
I followed the steps as in the recent thread on D-amino acids here:
http://www.rosettacommons.org/content/steps-use-d-amino-acids

1) I uncommented the aminoacids I need in residue_types.txt
3) I use the -score:weights mm_std in my docking protocol
4) I found the 3 letter codes in fa_standard/residue_types/d-caa and modified the input file accordingly.

When I run low resolution docking now, what I get is the following error message:

can not find a residue type that matches the residue DSERat position 2
ERROR: core:util:switch_to_residue_type_set fails
ERROR: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143

Is there some step that is not mentioned in the above thread that I skipped,
to have the D-amino acids activated?

Thanks for help

Jarek

Post Situation:
Fri, 2012-10-26 09:10
jarek

It's failing because you're trying to do the centroid docking phase on D-amino acids for which there are no centroid parameters. This is what is discussed in the latter part of that thread (post #12 on). Did you see that part and put in D-SER in the centroid parameter set files?

Fri, 2012-10-26 09:22
smlewis

Thanks for quick reply. I created all centroid params files now in chemical/residue_type_sets/centroid/residue_types.
I have also modified the chemical/residue_type_sets/centroid/residue_types.txt file to tell Rosetta where they are.
All files are read without problems except for DPRO (attached).
in case of L-Proline PRO.params centroid file, on which the DPR is based.
Here is the exact error message:

core.chemical.ResidueType: atom name : H not available in residue DPR
core.chemical.ResidueType: N
core.chemical.ResidueType: CA
core.chemical.ResidueType: C
core.chemical.ResidueType: O
core.chemical.ResidueType: CB
core.chemical.ResidueType: CEN
core.chemical.ResidueType:

ERROR: unknown atom_name: DPR H
ERROR:: Exit from: src/core/chemical/ResidueType.cc line: 1692

Any idea on how to solve this issue and why I get the message for DPRO but not for PRO?

Thanks for help

Jarek

Mon, 2012-10-29 03:54
jarek

Ok, this one I can fix.

The problem is in the patching system. Rosetta has parameters for the residue types, and then "patches" to represent special cases. Instead of 20 residue types, then 20 N-terminal residue types (with extra H atoms), and 20 C-terminal residue types (with the extra OXT), etc, there's just one "patch" file for the N-terminus, C-terminus, etc.

Many patches have to special-case proline because of the lack of the amide hydrogen - if you look in the patches you'll see the cases like so:

BEGIN_CASE ### PROLINE

BEGIN_SELECTOR
AA PRO
END_SELECTOR

SET_POLYMER_CONNECT LOWER NONE

END_CASE

BEGIN_CASE ### THE GENERAL CASE

SET_POLYMER_CONNECT LOWER NONE

## totally making this up:
SET_ICOOR H 120 60 1 N CA C

END_CASE

Anyway, you need to similarly special-case your DPRO in centroid mode (you can look and see it's already special-cased in fullatom mode). Here's how, for centroid/patches/NTermProtein.txt (which is the one that is crashing):

BEGIN_CASE ### DPROLINE

BEGIN_SELECTOR
NAME3 DPR
END_SELECTOR

SET_POLYMER_CONNECT LOWER NONE

END_CASE

Some of the other patches need to be special cased as well. You can do it by either copying the PRO case and replacing "AA PRO" with "NAME3 DPR", or you can just comment them out of patches.txt. For docking, you can probably do this:

patches/CtermProtein.txt
patches/NtermProtein.txt
#patches/protein_cutpoint_upper.txt
#patches/protein_cutpoint_lower.txt
#patches/VirtualBB.txt
#patches/ShoveBB.txt
patches/protein_centroid_with_HA.txt
#patches/VirtualNterm.txt
#patches/N_acetylated.txt
#patches/C_methylamidated.txt
#patches/RepulsiveOnly_centroid.txt
patches/ser_phosphorylated.txt
patches/thr_phosphorylated.txt
patches/LowerDNA.txt
patches/UpperDNA.txt
patches/VirtualDNAPhosphate.txt

You would need the cutpoint ones for loop modeling, and you won't need any of the other commented-out ones.

Mon, 2012-10-29 08:19
smlewis

Thanks! It works now!

Mon, 2012-10-29 09:08
jarek

In parallel to the low res docking I have also tested the full atom docking.
The rotamer libraries are read in the order of appearance of the aminoacids in the pdb file.
11 libraries are read, ending on histidine. Before reading methionine (next in sequence)
I get a segmentation fault:

./run: line 1: 10567 Segmentation fault /disk1/programs/rosetta-3.4-D/rosetta-3.4-bundles/rosetta_source/bin/docking_protocol.default.linuxgccrelease
@flags -database /disk1/programs/rosetta-3.4-D/rosetta_database -run:constant_seed -nodelay

Does 10567 stand for something? I have no idea where to look for the reason.

Jarek

Mon, 2012-10-29 04:38
jarek

10567 does not mean anything - it's probably the name of the core dump; there may be a file core.10567.

Can you recompile in debug mode and run that again to see if/how it fails? Usually debug mode gives more informative error messages.

I'm going to guess that the error is that the D-rotamer libraries were made a long time ago and the underlying code drifted somehow, but I'm not sure.

Mon, 2012-10-29 06:55
smlewis

I will recompile in debug, thanks! Any thoughts on the issue with the DPRO? (previous post)

Mon, 2012-10-29 07:07
jarek

I think it's because proline is missing the amide hydrogen in the backbone, and Rosetta is having a hissy fit about it for some reason. I'm trying to duplicate that one with my copy of 3.4. (I'll get around to trying the other one too, but it's more work to track the libraries down).

EDIT: this is fixed, but the fix is in reply to the original post.

Mon, 2012-10-29 07:17
smlewis

I have recompiled rosetta 3.4 in debug mode and now this is what I get:

core.pack.dunbrack: Dunbrack library took 1.23 seconds to load from binary
docking_protocol.default.linuxgccdebug: src/utility/vectorL.hh:352: typename std::vector::const_reference utility::vectorL<, T, A>::operator[](typename utility::vectorL_IndexSelector<(L >= 0)>::index_type) const [with long int L = 1l, T = long unsigned int, A = std::allocator]: Assertion static_cast< size_type >( i - l_ ) < super::size()' failed.
Got some signal... It is:6
Process was aborted!

Interestingly, when I run the release executable it goes through Arginine and crashes after his.

EDIT
-------
Running the same with Rosetta 3.3 gives the same result, except for "Got some signal"line:
docking_protocol.default.linuxgccdebug: src/utility/vectorL.hh:323: typename std::vector::const_reference utility::vectorL<, T, A>::operator[](typename utility::vectorL_IndexSelector<(L >= 0)>::index_type) const [with long int L = 1l, T = long unsigned int, A = std::allocator]: Assertion static_cast< size_type >( i - l_ ) < super::size()' failed.
./run: line 3: 12289 Aborted

Tue, 2012-10-30 06:47
jarek

We got a bit unlucky and that message is not all that illuminating. It's saying that Rosetta tried to access more items in a vector than there actually are in the vector - but it's not saying where/why it's trying to access past the end of the vector.

In these sorts of situations, it's helpful to run the program under a debugger and look at a backtrace. Typically I run under gdb. (gdb docking_protocol.default.linuxgccdebug; run ; backtrace; q; ). The full backtrace should tell where in the code it's erroring out, and looking at the surrounding code may give hints as to what's going wrong.

Tue, 2012-10-30 10:49
rmoretti

This is the output of gdb:
-----------------------------

docking_protocol.default.linuxgccdebug: src/utility/vectorL.hh:323: typename std::vector::const_reference utility::vectorL<, T, A>::operator[](typename utility::vectorL_IndexSelector<(L >= 0)>::index_type) const [with long int L = 1l, T = long unsigned int, A = std::allocator]: Assertion `static_cast< size_type >( i - l_ ) < super::size()' failed.

0x00000032f18305b5 in raise () from /lib64/libc.so.6
(gdb) backtrace
#0 0x00000032f18305b5 in raise () from /lib64/libc.so.6
#1 0x00000032f1832060 in abort () from /lib64/libc.so.6
#2 0x00000032f18299ff in __assert_fail () from /lib64/libc.so.6
#3 0x0000000000406e94 in utility::vectorL<1l, unsigned long, std::allocator >::operator[] (this=0x16987f88, i=18446744073709551604) at src/utility/vectorL.hh:323
#4 0x00002b503d4a3969 in core::pack::dunbrack::SingleResidueDunbrackLibrary::rotno_2_packed_rotno (this=0x16987ed0, rotno=18446744073709551604) at src/core/pack/dunbrack/SingleResidueDunbrackLibrary.cc:367
#5 0x00002b503d4a3a80 in core::pack::dunbrack::SingleResidueDunbrackLibrary::rotwell_2_packed_rotno (this=0x16987ed0, rotwell=@0x7fff711dfa68) at src/core/pack/dunbrack/SingleResidueDunbrackLibrary.cc:386
#6 0x00002b503d4872c2 in core::pack::dunbrack::RotamericSingleResidueDunbrackLibrary<4ul>::eval_rotameric_energy_deriv (this=0x16987ed0, rsd=@0x16244ef0, scratch=@0x7fff711dfa50, eval_deriv=false) at src/core/pack/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh:382
#7 0x00002b503d487c08 in core::pack::dunbrack::RotamericSingleResidueDunbrackLibrary<4ul>::rotamer_energy (this=0x16987ed0, rsd=@0x16244ef0, scratch=@0x7fff711dfa50) at src/core/pack/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh:276
#8 0x00002b503d42fe0a in core::pack::dunbrack::DunbrackEnergy::residue_energy (this=0x85b5140, rsd=@0x16244ef0, emap=@0x1630e910) at src/core/pack/dunbrack/DunbrackEnergy.cc:101
#9 0x00002b503e48c612 in core::scoring::ScoreFunction::eval_ci_1b (this=0x85bbde0, rsd=@0x16244ef0, pose=@0x7fff711e0500, emap=@0x1630e910) at src/core/scoring/ScoreFunction.cc:1223
#10 0x00002b503e4923bc in core::scoring::ScoreFunction::eval_onebody_energies (this=0x85bbde0, pose=@0x7fff711e0500) at src/core/scoring/ScoreFunction.cc:1103
#11 0x00002b503e493ac9 in core::scoring::ScoreFunction::operator() (this=0x85bbde0, pose=@0x7fff711e0500) at src/core/scoring/ScoreFunction.cc:567
#12 0x00002b503b542726 in protocols::docking::DockMCMCycle::init_mc (this=0x161df7c0, pose=@0x7fff711e0500) at src/protocols/docking/DockMCMCycle.cc:222
#13 0x00002b503b544b8b in protocols::docking::DockMCMProtocol::apply (this=0x866b2d0, pose=@0x7fff711e0500) at src/protocols/docking/DockMCMProtocol.cc:226
#14 0x00002b503b529048 in protocols::docking::DockingProtocol::apply (this=0x85b6d10, pose=@0x7fff711e0500) at src/protocols/docking/DockingProtocol.cc:781
#15 0x00002b503b25afa7 in protocols::jd2::JobDistributor::go_main (this=0x8615830, mover={p_ = 0x7fff711e08f0}) at src/protocols/jd2/JobDistributor.cc:375
#16 0x00002b503b25bb81 in protocols::jd2::JobDistributor::go (this=0x8615830, mover={p_ = 0x7fff711e09a0}) at src/protocols/jd2/JobDistributor.cc:200
#17 0x0000000000405a78 in main (argc=6, argv=0x7fff711e0aa8) at src/apps/public/docking/docking_protocol.cc:64

Wed, 2012-10-31 04:51
jarek

It looks like it's failing in trying to figure out which rotamer bin a particular residue is in (see the rotwell_2_packed_rotno() and rotno_2_packed_rotno() functions which occur right before/above the vector[] operator). Most likely there's some mismatch between the rotamer library you're using and the residue type specification. It's hard to say more without a small, self contained example which can be replicated and poked-around-with on a local machine.

Wed, 2012-10-31 11:46
rmoretti

I have tried to make it work on a number of examples. Here's one example from PDB.
I am trying to locally redock the D-peptide from 2Q3I.pdb on its L-peptide target.
Attached the pdb and the flags file I'm using.
I had to change some of the amino-acid names: DPN was substituted with DPH,
CSY with DCD and DGL with DGU.

The sequence of the peptide is GacGlGneewftlcaa (I use lower case for D-amino acids)
If I just run it as it is, it fails while reading the dgln rotlib.
I tried to truncate the peptide starting from the N-terminus:
eewftlcaa fails after dtrp.
ftlcaa fails after c but returns "error seqpos >= 1"
aa works fine.

It seems to me that the problem is not related to a particular rotamer library.
For other peptides, dgln and dcys libraries work just fine, and the program fails on others.

Thu, 2012-11-01 06:25
jarek

I'm not quite sure why it's happening, but it looks like the proximate cause is the fact that the code is calling core::pack::dunbrack::rotamer_from_chi_02() for the non-canonical amino acids. This sort-of works for residues with only one chi, but for multi-chi atoms it fails because rotamer_from_chi_02() is specialized for canonical amino acids. (The end result is that the rotwell vector gets '0's, which then turn negative in core::pack::dunbrack::SingleResidueDunbrackLibrary::rotwell_2_rotno(), resulting in an out-of-range index)

I'm not sure where things are going off the track, but I'll email our NCAA expert - but keep in mind that he's in NYC, so he might be busy with other things for a while.

Mon, 2012-11-05 18:53
rmoretti

Thanks for update.
That's a good news you were able to reproduce the bug.

Tue, 2012-11-06 01:41
jarek

Doug pointed out that the core of the issue is that you shouldn't be using fa_dun with NCAA's to begin with. While you have set "-score:weights mm_std" appropriately, it looks like the docking protocol isn't obeying that completely, so it's using fa_dun, even though it shouldn't be.

For a short term fix, it looks like it may be sufficient to change the call to DockingProtocol() in rosetta_source/src/apps/public/docking/docking_protocol.cc to

DockingProtocolOP dp = new DockingProtocol(utility::tools::make_vector1(1), false, false, true, NULL, core::scoring::getScoreFunction ());

You'll also need to add several headers to the list of includes at the top of the file:

#include <core/scoring/ScoreFunctionFactory.hh>
#include <utility/tools/make_vector1.hh>

If you then recompile, that solves the immediate problem, though additional ones likely will crop up (e.g. the test case now crashes on missing centroid params files)

Tue, 2012-11-06 12:07
rmoretti

Clarification: Changing the source and recompiling is only necessary if you're running Rosetta3.3

If you're running Rosetta3.4, all you need to do is add "-score::pack_weights mm_std" in addition to "-score:weights mm_std"

Tue, 2012-11-06 12:20
rmoretti

I started to get the same error message when running a Rosetta script
that uses the HBondsToResidue filter for complexes composed of NCAA's.
(attached)

Since there is an energy_cutoff used in the filter, can I somehow make the
filter aware that it needs to use the mm_std weights, like I did for docking?
Passing -score:weights mm_std and -score:pack_weights mm_std in the flags file
doesn't do the job here.

Jarek

Fri, 2012-11-16 03:59
jarek

It looks like HbondsToResidueFilter hard codes standard.wts/score12.wts_patch at the scorefunction to use, so there's no commandline or tag option which will fix things.

There's several ways around it, though. The easiest one is to massage the standard.wts and score12.wts_patch to remove the offending terms. You might even be able to avoid touching the database if you put the changed copy of standard.wts and score12.wts_patch files in the current directory (the one you're running the program in).

Fri, 2012-11-16 11:37
rmoretti

It works great now! thanks!
One more thing I've noticed is that the d-cys-bridges are messed up during docking.
is there some kind of patch I should apply to keep them fixed?

Wed, 2012-11-07 02:32
jarek

D-disulfides are probably unhandled. Casual inspection of the disulfide code unsurprisingly finds insistence on residues being "aa_cys" in the aa enum. DCYD is set up for disulfides, but it's aa_unk, not aa_cys, so I suspect the disulfide score never runs. Assuming you have no L-disulfides in your structure, do the disulfide terms all come out as 0 (indicating that they aren't running)?

Wed, 2012-11-07 07:07
smlewis

Hi, I'm sorry but being a Rosetta newbie, I have no idea how do I check if the disulfide terms come out as 0. Where do I look?

Thu, 2012-11-08 01:46
jarek

You should probably have a scorefile (probably score.sc) that is basically a whitespace-delimited spreadsheet of energy terms and total score for each result. The scores are also likely to be at the bottom of the file in the .pdb outputs. There will be four terms that start "dslf" that are the disulfide terms:
dslf_ca_dih
dslf_cs_ang
dslf_ss_dih
dslf_ss_dst

If those terms are not present, disulfides are not being scored at all (they are in mm_std, so they ought to be present). If those terms have all zero values in each of your pdb structures, then none of your d-cysteine disulfides are being recognized as such, which explains why they aren't being maintained properly. If you can confirm that you have a d-cysteine disulfide that comes up with zero score (meaning it's unrecognized) then can file a bug report that D-cysteine disulfides don't work, but it's not a bug I expect to be fixed soon.

Thu, 2012-11-08 07:20
smlewis

I don't have any of these terms in my output score file.
How do I prohibit the application to repack and break the bonds ?

Thu, 2012-11-08 07:48
jarek

A) Show me what terms you DO have in your output file

B) We probably can't. Assuming the local backbone context is fixed, if you are using a resfile, you may be able to pass NATRO for those residues to prevent packing. I forget what executable you're using so that might not work either.

Thu, 2012-11-08 07:50
smlewis

If you're using a protocol that's constraint file aware, you can pass in a manually-made constraint file which enforces the disulfide bond geometry. (You would also need to make sure your weights files have the appropriate constraint term turned on as well.)

Thu, 2012-11-08 10:29
rmoretti