You are here

Modeling phosphate ion binding site in protein.

20 posts / 0 new
Last post
Modeling phosphate ion binding site in protein.

I want to model and predict (together with the binding affinity) of phosphate ion on my protein structure. Does rosetta have any protocol for this .. or how can this be done using rosetta ??


Post Situation: 
Tue, 2011-05-17 19:08

Ligand docking can be used to predict WHERE a phosphate might bind. I don't think Rosetta is sufficiently calibrated to give you a good estimate of binding affinity.

Phosphate binding is going to be an extremely challenging prediction because it is all polar/h-bond interactions.

Wed, 2011-05-18 08:06

The ligand docking protocol has been calibrated against binding affinities, although as with most docking protocols, there are rather large error bars. See Jens Meiler and David Baker (2006). "ROSETTALIGAND: Protein-Small Molecule Docking with Full Side-Chain Flexibility" Proteins 65, 538-548. for details. Davis et al. (2009) "Blind docking of pharmaceutically relevant compounds using RosettaLigand." Protein Sci. 18(9), 1998-2002 is probably also worth looking at.

As Steven said, phosphate is probably going to be a stretch. Compared to the compounds tested in the paper, it's rather small, so there's a lot more binding modes that need to be tested (making the search harder), and the interactions are going to be dominated by hydrogen bonding, electrostatics and polar desolvation. Rosetta tends to work best with steric and hydrophobic interactions (i.e. those interactions typically found in protein cores), with electrostatics and polar desolvation being the weak areas.

Wed, 2011-05-18 08:32

SO , how can this be approached ?? ...

Wed, 2011-05-18 17:12

The documentation I linked gives a pretty thorough example of what to do. The largest differences here are:

A) you don't have ligand flexibility to worry about, with just a phosphate,

B) you'll want extra modeling and lots of luck to account for it being a tougher problem.

Wed, 2011-05-18 17:32

I you have a good idea as to the general area where the phosphate should dock (e.g. from mutational studies or homology considerations), it might be helpful to limit the docking runs to that region. You can specify the areas to sample with the -docking:ligand:start_from and -docking:uniform_trans flags.

For a really thorough study, you may want to locate crystal structures of protein-phosphate (or small-molecule phosphates) complexes with known affinities. You can then run these controls through the protocol to benchmark how well Rosetta does in predicting phosphate position, and also to get a more phosphate-specific calibration curve (and potentially error bars) for the REU to Kd conversion.

Thu, 2011-05-19 08:12

Thanks for the suggestion...

There are crystal structures of proteins with phosphate... I didn't understand about running protocol runs.. Do I have to dock the crystal structure with phosphate again to look for the binding affinities...

Also, I want to graft the metal binding sites and then I have to check for the binding site affinity for the grafted region. Can rosetta be used in such a way that it can automatically change the amino acid in the grafted region to increase the binding affinity..

Thu, 2011-05-19 23:08

By "run the protocol" I mean dock the known structures with the ligand docking application, in the same way you're docking your unknown structure. The thought is, if Rosetta does well on the known structures there is a reasonable chance it'll do well for similar unknown structures. Conversely, if you can't get good results with a known structure, you're unlikely to get good results with a similar unknown structure.

The ligand docking application is intended for fixed sequence problems. The amino acid sequence of the protein you put in is the amino acid sequence of the protein you get out. If you want to change the sequence to improve affinity, you'll probably want to use the enzyme design application. ( -- while it's called "enzyme design", it also works for designing ligand binding proteins.) Internally, it's much the same as the ligand docking application, but with extra features to allow for design.

Fri, 2011-05-20 09:07

As you said for croos docking experiment.. I have started doing that but I am having some problems as its new for me..
1) As per the tutorial , I have created a parameter file for my phosphate ion using molfile_to_paramfile.
2) Next, I prepared the receptor for docking by suing the lingand_rpkmin command . Here after generating the 10 structures how can I select the best one for further docking ..
3) Okay say I ahev selected one structure , the next step is to provide the native structure.. I have it for my protein but phosphate ion is present in the crystal structure. So in this case shall I provide the native structure without phosphate ion one or not ??..
4) Finally, coming to actual docking how to give the number of steps the dockign should continue ... and how large should this number be in mycase ?

After running the docking command I am getting the following errors :-

ERROR: Pose has no jumps!
ERROR:: Exit from: src/protocols/ligand_docking/ line: 272

Here is the list of parameters that I used :-
-rng mt19937
-s 1GFL_mutate2_native_0002.pdb

-nstruct 500

-uniform_trans 5
-start_from 0.000 0.000 0.000 (as I want rosetta to search for binding site, like in case of blind docking)

-mute ## dont show timing info

Pls help ...

Wed, 2011-06-01 19:20

Specifying the native is optional. The docking should still work without it, you'll just not get some of the diagnostic information about how close the predicted conformation is to the native conformation. Supplying a native should have absolutely no effect on which structures (or their energies) you get out from the prediction.

For the jump issue, each (covalently connected) molecule should be it's own chain. My guess is that your phosphate is listed as being part of the same chain as the protein in the input PDB. Usually we set the protein to be chain A, and the ligand (phosphate in your case) to be chain X - although any combination where the ligand is the last chain (alphabetically) in the file should work.

Regarding flags, the ones listed in the documentation ( are the recommended ones to start with. I'm not quite sure what you're asking about with regard to cycles. If you're talking about the total number of decoys to produce, 5000 trajectories (e.g. ten runs each with -nstruct 500, although you can mix and match based on how many processors you have available) is somewhat of a standard, for tricky cases (possibly your phosphate) you may want to increase that. What I'd recommend is to do the 5000 trajectories for the controls, then plot the predicted binding energy against the rmsd of the output ligand to native. Ideally you should get a clear "funnel" like plot, where the spout of the funnel points to the low energy, low rmsd corner. If not, you'll have to change your docking proceedure, say by increasing the sampling (the number of trajectories you run).

P.S. I have access to a CONDOR cluster, so when I do docking I tend to use the setup script. Even if you don't use condor, you can still use to set things up, and then massage the condor script to work with your cluster or even just the command line.

Thu, 2011-06-02 08:37

After generating some 500 structure using nstruct command and looking the top 5% out of those structures revealed that for all the structures the phosphate ion binds to the same location . I found this quite strange, as far as I know docking searches the entire protein for interaction. To my surprise I found that in all the 500 structures generated , phosphate ion binds to the same location ...

is it due to the following parametrs that I gave during docking :-
-uniform_trans 5
-start_from -1.731 32.589 -5.039

start_from position I copied from the tutorial. Exactly what should be this location when we don't know the binding site, as in case of blind docking .

You said that I need to generate "ten runs each with -nstruct 500,", what does this means. As I gave -nstruct 500 which generated the same no. of structures. Pls clarify this ??

How do I have to use the plot_funnels.R . Do I have to install the R package for this ??

Pls reply

Thu, 2011-06-02 19:24

The ligand docking application is somewhat pocket-based. You specify one or more potential starting locations (pockets) and then the ligand docking application explores various binding poses in the region of that pocket. From the flags you give, it's set up to search a 5 Angstrom sphere (-uniform_trans) with a random rotational orientation (-randomize2) around the -start_from location. It won't look at possible binding sites outside of that 5 Ang sphere. To search other sites, add them with additional -start_from lines. Especially with something like phosphate, you'll need to narrow down the potential binding sites in order to get decent results. To do this I'd suggest examining the starting structure, and applying some chemical intuition about where the ligand might bind.

Note that having the top 5% all being in the same location doesn't necessarily mean that you're getting poor sampling - it just might be that there's a very tight binding funnel and that spot is the best binding location. You can tell the difference by looking at some of the higher energy structures.

To that end, you do need to have R installed in order to use the plot_funnels.R application. If installing R is an issue for you, note that you can create similar plots with other graphing programs, by plotting the "interface_delta" values against the "ligand_auto_rms_no_super" values in the output score file.

Fri, 2011-06-03 11:35


Here's what I did during my third trial:-
1. First, I docked phosphate ion with autodock , to search for potential binding site and I got some 3 residues on the surface of my protein binding to phosphate ion.
2. Next, I wanted to check the binding affinity + other residues (apart from those three tat I got during autodcok run) that can interact with phosphate ion in that region. So I took the search space coordinated from autodock run and docked the ion with rosetta ligand.
3. I generated some 5000 structures and after taking the best 5% , sorted using total score I was left with only 10 structures.
4. After visulazing the docked complez in pymol, I found that ion shows interaction with only 1 amino acids (rather it should show interaction with those three amino acids which I got after autodock run). So in this case what has to be done..

Am I going wrong some where or do I have to generate more number of structures in order to get better results..

For plotting the funnel I need two parameters interface_delta and ligand_auto_rms_no_super , but I cannot use the second coz my native structure doesnot have an already docked ion in it. So in that case what shall I do ??


I want to know what all parameters are good for analysing the result is it total_score or interface_delta ?? as after checking my score file I found that the total_score for all the structure lies somewhere between 227 and 229

Mon, 2011-06-06 18:18

Right, you can only get the ligand_auto_rms_no_super (and thus plot a docking funnel) for those complexes where you have a known native. For figuring out what sort of parameters you need to run (how many structures to generate, etc.) it's good to have positive controls of similar proteins with similar ligands, run those positive controls through the protocol and confirm that under the conditions you're using you're able to adequately recapitulate the appropriate results for the positive controls.

The total_score is more-or-less a measure of the (free) energy of the entire system. So if you have two similar structures, Rosetta predicts that it is more likely to be in the one with the lower total_score. (As with free energies, the absolute numbers aren't all that meaningful; it's more the difference between two states that's relevant.) The interface_delta, on the other hand, specifically measures the energy of association of the ligand with the protein. It's a measure corresponding (crudely) to the delta-G of the ligand-protein association reaction. So the protein-ligand complex will most likely be found (by Rosetta's estimation) in those conformations which have low total_scores. Once you've identified those most probable conformations, you can gauge the strength of the ligand binding by looking at the interface_delta.

I'm not sure why Rosetta is predicting association with only one of the amino acids, rather than all three of the ones Autodock predicted. More interactions should be favored over fewer interactions, so my off-the-cuff guess is that for the two other residues Rosetta can find more favorable interactions with other parts of the protein, instead of with the ligand. As mentioned above, Rosetta really isn't currently parametrized well for phosphate binding. What you're seeing may be a result of that. My guess is that Autodock, being somewhat pharmaceutically biased, is going to be relatively better parametrized on phospho-compounds and phospho-binding proteins. Frankly speaking, if you like the structures and the binding energies coming out of Autodock, I'd go with them. This is not really a situation where Rosetta has a leg up on Autodock.

Mon, 2011-06-06 22:00

thanks for that detailed discussion. But my objective was to carry out enzyme design for my protein. Binding of phosphate is very important for that. So, if in that case if Rosetta is not showing the interactions that are shown by auto. So, does it mean that the enzyme design experiment may also go wrong??

Tue, 2011-06-07 17:12

Certainly if you're attempting to design a phosphate binding site, Rosetta not being parametrized well on phosphate binding sites might be a limitation, as it may mispredict interactions. However, for designing a binding site from scratch, you're probably more limited by the number of interactions you can make to the phosphate. Native phosphate binding sites tend to have numerous interactions to the phosphate, and it's currently difficult for Rosetta to put in all of the interactions, due to geometry limitations.

If you already know where the phosphate will bind, your better bet is to use the "catalytic constraints" facility of enzyme design. That is, put in the phosphate binding interactions/geometry with an enzdes constraint file. If you do it this way, Rosetta will keep those interactions (somewhat) fixed, and then will spend its time optimizing the other interactions to the protein (e.g. the other molecules/moieties you're putting in with your design).

By the way, it might help to give a broad overview of the general task you're trying to accomplish. No guarantees, but we may be able to suggest a different route to the same goal, as opposed to treating each stumbling block du jour one at a time.

P.S. If you're doing enzyme design with a phosphate, you may want to add a "hack_elec 0.25" to your weights file. hack_elec is a coulombic electrostatic term, which would likely help with your charged substrate. It's turned on for ligand docking, but isn't currently included on in the default enzyme design weights file.

Wed, 2011-06-08 09:36

Alright.. So I have completed collecting information about the catalytic mechanism and active site residues. So, I will be starting with the enzyme designing part. I am considering the phosphate binding region which I got after docking with autodock as the residues that I have found goes well with the phosphate binding motif..

I am working on low molecular weight phosphotyrosine phosphatase . Here two amino acids are important :-
cys 12- nucleophile attacking the phosphorus atom
asp129 - activates a water molecule which hydrolyses the phosphorylated cysteine.

I think I am having constraints for cys and asp129 but how to put a constraint for that water molecule which is very important for catalysis..

I have attached a file depicting mechanism .

Looking for forward for further suggestions from your side..

Thanks a lot

Wed, 2011-06-08 23:03

So I'm assuming that you're doing the enzyme design in a scaffold protein that already has the cysteine and aspartate amino acids in the appropriate sequence location, and that your docking trials from Autodock have given you a reasonable location for the location of phosphate.

One issue you're going to run up against is that your reaction involves a change in geometry around the phosphate. Although the substrate, covalent intermediate and product are tetrahedral (although with a Walden inversion between the substrate/product and covalent intermediate), the transition state is trigonal bipyramidal. If you go by the transition state theory, the role of an enzyme is to maximize stabilization of the transition state, so when doing enzyme design, you'll want to account for the transition state geometry (Published enzyme designs with Rosetta have mainly worked off of the "theozyme" concept from the Ken Houk lab - this includes not just the transition state itself, but also the geometry of the surrounding amino acids.) ... Although you don't want to only model the transition state, as the enzyme also has to accommodate the substrates, the products, and the intermediates. Usually this is handled by designing toward a multi-state hybrid molecule.

When water has been an important part of the design (e.g. the retroaldolases), the water has been explicitly incorporated into the transition state hybrid with "dummy" bonds. In your case, I'm not sure if you'll need to model a separate water molecule, as the water in the second step is just performing the reverse of the tyrosine oxygen in the first step (that is, the tyrosine oxygen and the water oxygen will probably occupy the same position in the two halves).

My recommendation is to make a transition state/substrate/product hybrid structure which incorporates all the key features you want to accommodate/stabilize in your enzyme, and then decide what the permissible geometries of the interactions to the cys & asp are. From that you'll create your ligand structure and constraint files (don't forget to mark the cys-phosphate interaction as covalent) to use in the enzyme design application. You then need to prep your starting structure - align your hybrid molecule in the appropriate starting location (the key thing is to get the constrained geometry correct - don't worry too much about clashes with sidechains, as the enzyme design application should fix those) and place the appropriate remark lines at the top of the PDB, telling which amino acids correspond to which constraints. From there, run the enzyme design application and tweak as necessary to emphasize what you think is important.

By the way, I get the impression that you may be starting from a structure which already catalyzes this or a related reaction. If that's the case, I would recommend minimizing the amount of alterations you do to the catalytic machinery. Although you should do all of the transition state and constraint modeling to make sure all the appropriate geometry is preserved, focus your attention on the parts of the structure which are involved with the differences between your reaction and the native reaction. (Look into resfiles for limiting which amino acids can change.) One of the things we've found is that you tend to get better results in enzyme design if you minimize the number of changes from the native scaffold.

Fri, 2011-06-10 11:49

Actually, the scaffold is entirely different from the native scaffold.. I am trying to bring the native scaffold topology to my protein, which is Green Fluorescent Protein. So, far I have modelled the structure with minor changes of 2 or 3 residues in the scaffold, keeping in mind the topology of the original active site.

I did not understand the concept about designing substrate/product hybrid structure, coz in the tutorial only TS modelling has been described.

Also , I did not understand much about - "You then need to prep your starting structure - align your hybrid molecule in the appropriate starting location (the key thing is to get the constrained geometry correct - don't worry too much about clashes with sidechains, as the enzyme design application should fix those) and place the appropriate remark lines at the top of the PDB, telling which amino acids correspond to which constraints." Can you explain it a bit ??

For your information , I already have the predicted and published constraints for bond length, dihedral angles. So, now I want to know that what all constraints are there if I simple apply will that be ok, say for eg. in the native protein cys17 makes h-bond with phosphate of p-Tyr, but after docking I found that the h-bond is not present. So, can that be modelled as a constraint or not ??

Sun, 2011-06-12 22:49

What ligand you design to depends on what you think is important for catalysis. Probably the main feature is transition state stabilization (hence the tutorial). Depending on your reaction, you can probably get decent results with just modeling the transition state - I believe that the published Kemp eliminase designs used that technique. Sometimes more is needed. For example, while early Diels Alderase designs had great transition state binding, the pocket appeared to be too closed to permit substrate binding. Designing to a ligand with bond lengths intermediate between the (non-covalent) substrate distances and the transition state opened up the pocket and gave better results. An extreme example of a "hybrid" structure was used for some of the retroaldolase designs. For those, a number of structures (substrate, product, and two transition state mimics from different steps in the reaction) were simply superimposed into a single structure, so that each atom was present in three to four different locations. It wasn't a molecule that could ever exist in nature, but being able to use such theoretical hybrids is a benefit of computational design. -- My suggestion is to look at your reaction, and include those features you believe are important. If you think TS stabilization alone is sufficient, then just go with the TS mimic. If you want to include substrate/product binding, add atom locations for the interactions you want to stabilize/favor. Keep in mind you can do several runs with slight different parameters each time.

To use the enzyme_design application (see documentation you need to have a starting structure which not only has the scaffold protein, but also includes the ligand/TS mimic/hybrid you're designing to. This needs to be in approximately the same location that you want it to end up in. (Particularly, you want to place it such that your constraints are all satisfied.) As you have a model for where the phosphate of the ligand should end up, my recommendation is to open up both the template model and the ligand pdb (specifically the ligand PDB that is the output of, as that program renames the atoms) in something like PyMol. You can then use the pair fitting wizard (or equivalent) to move the ligand pdb structure such that the phosphate atoms line up with the locations where you want to put them. Save the translated/rotated molecule, and then use a text editor to create a PDB file with the template protein and the translated ligand. Also, as you have constraints, you'll need to specify how the constraints in the .cst file match up to residues in the PDB (see the info about the "REMARK" line in the enzyme_design documentation). This is the PDB file you want to use for input to the enzyme_design application.

As for which constraints to use, use the constraints that you want to be included in the output structures. From the perspective of the enzyme_design application, constraints are just a way of saying "I want this general geometry to hold in the output structure". The enzyme_design application will then try to keep the constraints valid throughout the design. So you can't make a constraint to a non-existent residue, but you can make a constraint to a residue that's slightly out of range, although it's recommended to have the constraints be (almost) satisfied going into the enzyme_design application.

Mon, 2011-06-13 14:08