I'm Ben, I'm new here. I'm working on re-designing the binding pocket of an amino-acid binding protein for it to bind a new ligand. For this, I use enzdes. I want to compare the designs Rosetta makes with the wt protein in terms of binding energy (i.e. ligand-protein surface interaction energy) as a control. My question is: Is ligand docking the right application for this?
My plan in detail:
-Dock the wt ligand in the wt protein using water molecules as cofactors (trying to emulate the real binding mode, which includes water mediated contacts). I would use the ligand docking application with the relaxed (repacked) wt protein and no extra ligand rotamers (other than the crystallographic).
-Dock my new ligand in the wt protein using the water molecules that fit in there (not displaced by the new ligand) using the repacked wt protein and some ligand rotamers.
-Take my best design with the new ligand in it and check it's energy.
-Dock the wt ligand in the designed protein using some rotamers for the wt ligand.
Ideally, I will see that the new ligand in the designed protein has a similar energy to the wt ligand in the wt protein and the swapped structures are similarly disfavored.
Maybe there is a better way to do this that I'm not aware of, so I hear suggestions.
Sounds like a reasonable plan to me. Some comments though:
I would try to match the flexibility of the native and designed ligands. If you're allowing rotameric flexibility on the designed ligand, I'd use a similar level of flexibility on the native ligand. Otherwise you might run into a situation where you'll find a lower-energy binding site for the designed ligand versus the native, simply because it can flex more. I'd also match the way you prepared the structures. For example, you'd want to compare the scores of the docked design with those of the (re)docked native, rather a scored, undocked from-PDB structure.
Docking the designed ligand to native protein is an okay way to find a starting pose for design, but may not work all that well, especially if the new ligand might be stericly limited. Other possibilities are to overlay the ligands on common motifs, or to use the matcher to find starting structures which have the desired interactions. (You can match with "existing" interaction as well as introduced interactions.)
Dealing with water is hard. We tend to ignore/eliminate explicit water as much as possible. I don't think the ligand docking program (or enzdes or even the multiple ligand docking protocols) can handle the full range of movement of the water (e.g. it won't be able to recognize that flipping the water will allow it to make a better hydrogen bonding network). I might recommend, in addition to the water-included design/dock (which probably is worth trying), you also do a design/redock without the explicit waters. If you're able to make a binding mode which avoids the water-mediated contacts, you may be able to get better discrimination.
As a final note, keep in mind that by default the ligand docking application and enzdes applications use slightly different scoring functions/weights (both of which are slightly different from the default scoring function used with protein structure prediction). The general trends should be the same, but not numerically identical, even for exactly the same conformation. (If for some reason you need consistency, there's ways to change the scoring functions used, either through the command line, or by using RosettaScripts to do docking/design.)
Best of luck,
Thanks for the fast reply. I will observe your recommendations. Now, I was running enzdes with ligand rotamers for the first time and, after one day of calculations, results came back and only one rotamer was used, the one with the starting coordinates. I'm quite sure Rosetta found the rotamers I gave it, because I had to do a couple of tries before getting the format right and Rosetta would give me errors about the rotamer atoms. I'm not sure why it didn't use the other rotamers, I really doubt it's because the first one is the best.
Here I paste my flag for enzdes:
Hmm, that's interesting. One possibility is that you're only getting the starting ligand conformation because that's the one which is energetically most favorable. One thing to realize is that when swapping ligand rotamers, Rosetta superimposes the NBR_ATOM and the atoms bonded to it. If molfile_to_params happened to place the neighbor atom on a "sidechain" rather than the "core" of the ligand, by superimposing the neighbor atom your "core" may be waving around, resulting in very poor ligand energies. You should be able to specify which atom you want molfile_to_params.py to use for the neighbor atom by adding a "M NBR" line to your .mol file, or by specifying them for .mol2 inputs with the --m-ctrl flag. (run "molfile_to_params.py -h" for more info.) I find that outputting the kinamage file and looking at it is helpful in determining how molfile_to_params is constructing the ligand.
The other possibility is that you're not actually adding the extra ligand rotamers. The tracer output when running the program should say at some point how many ligand rotamers were added. By default molfile_to_params.py will output a params file that only has a single ligand rotamer. You need edit the params file to manually add the PDB_ROTAMERS line to get the additional rotamers.
Regarding Rosetta not using other rotamers, I made sure it's reading the rotamer file, it prints "50 ligand rotamers found!" (or something like that) at the beginning. I tried just changing the NBR atom in the .params file, but that didn't work, anyway, I'll try making the whole thing again specifying the NBR atom when running molfile_to_params.py.
About the ligand docking: I'm getting a segmentation fault after a list of "Unsatisfied interface H-bond: *AA* *ID* *ATOM*" during the first structure calculation.
I'm sorry I can't give you more information, but no error message is displayed, it just says "segmentation fault" and then exits. The silent file output is empty.
The flagfile I'm using might help:
Ugh. In theory, the release version should never see a segmentation fault, regardless of what you throw at it. In practice, often something as simple as a slightly misformatted input file will cause one.
In order to figure out what's causing it, the run needs to be redone in debug mode. When compiling, specify mode=debug instead of mode=release. Then when running, use the *.linuxgccdebug application instead of the *.linuxgccrelease one. (Note that debug and release modes can coexist in the same directory tree.) Occasionally the error message given when running in debug mode is enough to figure out what's wrong, but frequently it needs to be run in a debugger such as gdb in order to get a backtrace to figure out exactly where it's crashing.
You can either do the debugging yourself, or if you can put together all your input files, you can contact me off-forum (click the envelope to the left), and I'll be willing to check them myself.
I found what was wrong. I just deleted -in:file:native line. I used an apo structure as an input and maybe the program was trying to find the ligand in there. Anyway, I haven't tried to input any native structure after that.
Going back to the rotamer issue, I now choose a NBR atom that does not change place in any of my rotamers and checked again that Rosetta finds the list. I have tried to see if Rosetta uses more rotamers by doing short runs (10 structures), do you think this is the right way to check for that? I don't know how Rosetta manages rotamers, but if it is using them in the order they are in the PDB input I would try a longer run.
The -in:file:native is only being used for ligand RMSD calculations, not for any sampling or input purposes. If you have a "known" ligand pose you want to compare results to, that's your native. If not, it should be alright to omit the -in:file:native. (In that case, I believe the input structure is then used as the comparison for calculating RMSD.) So I'm not surprised it choked on an apo native (it really should have given a sensible error message instead of dying, but at least there's a work-around.)
The ligand rotamers should be treated near identically to that of the protein sidechain rotamers. That is, all of the rotamers should be available to be put in as a replacement during packing, and which rotamer gets chosen is stochastic, but biased by its energy in the packer's simulated annealing protocol.
As the process is stochastic, running more -nstructs should get you more coverage of the field of possible rotamer combinations, but there's no one-by-one allocation (e.g. the first nstruct is completely equivalent to the hundredth nstruct, except for which random numbers are being pulled). If the tracer output is saying you've loaded all the ligand rotamers, and you're seeing rotamer variation in your output structures, you can be confident that all the rotamers are being used during packing. (There's ways to iterate through individual ligand rotamers, but not with the ligand docking application, as far as I know.)
Note that Rosetta packing is quite efficient - if you were only doing packing, a couple of dozen nstruct would probably be more than sufficient to be confident you've likely sampled the lowest energy structure that's accessible during packing. Where you need to kick up the sampling is when you allow backbone movement or rigid body movement (both of which happen during standard ligand docking). This inflates the degrees of freedom significantly. The recommendation for ligand docking is to do somewhere around 5000 output structures to get sufficient sampling of the ligand positions. (Depending on your system this may be overkill or not enough, so if you're concerned, you can do some tests if you want.)
Thanks for all your help.
About the ligand docking: I'm just trying to reproduce the binding mode and binding energy order* of the wild type ligands. So I have used no extra rotamers other than the crystallographic. I expect other ligands similar to those normally bound by my protein to have a similar binding mode/orientation to the wild-type ligands, so I will also use one rotamer for them. I just expect their binding energies to be worse.
So far, by making 500 structures I have been able to get the best energies for each of my wt ligands in an order* that is in agreement with the experimental data. Also, the binding mode is the same as in the crystal. All this was done including the crystallographic water molecules that mediate protein-ligand contacts.
About the enzyme design: I have tried to use rotamers for this but, although I see that Rosetta finds all the rotamers, they don't come out in any of the 10 structures it outputs, not even after redefining the NBR atom. Is there any other way to check why they are not being included? some kind of logfile?
*:When I say "binding energy order" I don't mean that this energies are in the numeric order of the experimental DG, but rather that the ligands with most affinity have the most negative energy.
The log file for Rosetta is the tracer output that's printed to stdout. As mentioned before, there should be a line in there saying how many ligand rotamers were loaded.
Assuming that the rotamers are loaded appropriately, the most likely reason for not getting any different rotamers is that none of the other rotamers are as energetically favorable (which would be reasonable, especially if you're starting with a native-like conformation). Have you tried superimposing some of the other ligand rotamers into the protein? My guess is that most will have clashes with the backbone. Depending on your active site and how finely you've sampled your rotamers, there might not be a rotamer in your library that doesn't have a significant clash with the backbone.
If you have a rotamer which you think should fit, you can try playing around with the input structure to favor the other rotamer/disfavor the input rotamer. (For example, use something like the mutation feature of pymol to manually make a clash with the starting ligand which would be alleviated by the alternate ligand.) You might also want to try omitting the -packing:use_input_sc and -packing:unboundrot flags in your test runs, as these cause Rosetta to favor the input structure. (So if the two rotamers are almost equivalent, Rosetta would be biased toward the one in the input.) You might also want to try increasing the number of trials - the other rotamer might be selected, but only at a low level that isn't being picked up with your 10 runs.
Another suggestion for debugging is to use RosettaScripts, and specifically the TryRotamers Mover. In contrast to most other movers in Rosetta, TryRotamers isn't stochastic in picking rotamers, but will sequentially move through each rotamer, so successive nstructs will put in another rotamer. If you set up a script with TryRotamers as the only mover (i.e. so there's no filtering on energy), the output files should show how Rosetta is placing the rotamers for your ligand into the structure.
I haven't gone back to the rotamer issue yet, I will let you know how that goes.
In the mean time, the ligand docking strategy for validating the designs didn't work. I want to try other thing now, is there a way to ask Rosetta what is the total energy of a crystal structure according to its force field? (Given that I can provide *.param files for anything that is not part of the protein)
Depends on what you mean by "total energy". Pretty much anytime you pass a structure through Rosetta, the output results will include a notation of the "total energy" of that structure according to the energy function that's being used at the time. There are applications (score or score_jd2: http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_...) which are specifically intended to take input structures and output reports detailing what Rosetta thinks of the structure, energy wise.
Note that this isn't particular to score and score_jd2 - both enzdes and ligand docking output scoring information with each result structure, with additional details not provided by score/score_jd2.
I still can't fine what's wrong with the rotamers. I have tried all to "make" Rosetta use them, disfavoring the initial conformation. I could try to check the stdout, but I don't know the flag to create it, would you tell me how that works?
Alternatively, I could send you the files I'm using through PM.
Sorry for the programmer jargon - stdout is the "standard output". It's what gets printed to the screen when you run the program, assuming you don't use the shell to redirect it to a file or pipe it to another program.
At this point I'm not sure what else to suggest, so feel free to send me the files.
Oh, I see, then we can discard that, I see the ligands are loaded but I don't know what it does with them. I'll send you the files through a PM right away.
You mentioned you were using a resfile to specify designable/repackable positions, but for some reason I didn't catch it. In the resfile you sent me, you don't include the ligand position in the explicit list of positions in the resfile body, so the ligand falls under the default behavior, which was set to NATRO. You're not getting rotamer substitution in your protocol because the resfile says not to substitute rotamers.
If you add the ligand specifically to the resfile, e.g.
1 X NATAA
you should then get rotamer substitution at that ligand position.
Alternatively, for the enzyme_design application, you can change the default behavior from NATRO to AUTO. Then all of the residues which aren't specified will be controlled by the standard enzdes -cut1/-cut2/-cut3/-cut4 flags (which allow a repackable ligand).
Another thing is that all your rotamers involve the positioning of a terminal amine, which is making great (native-like) contacts with the sidechain and backbone in the input PDB. a priori, I would have doubted you'd be able to find a lower energy conformation for the amine, so most of your output structures will likely be close to the input structure. Turns out that's not quite the case, as you do find a few different rotamers when the resfile is changed, but it does explain the fact that almost all of the runs converge on the same few rotamers. (For future reference, you can test if this is an issue by deleting the interacting residues completely, so you're doing a test run on a ligand that's flopping around in solvent.)
Sorry for the late reply, I finally could include the rotamer library. Sadly, I couldn't reproduce the binding energy order of my wt ligands. All the rest worked just fine.
Thanks for your help,