You are here

Why do non proten ligands get a very high score in kinematic loop modeling/ in general?

9 posts / 0 new
Last post
Why do non proten ligands get a very high score in kinematic loop modeling/ in general?
#1

hi all,

I am trying to model a loop (using kinematic loop modeling) that is in close proximity to a ligand. So I included the ligand by converting the HETATMs into Rosetta atom types and including centroid (for remodel) and all-atom (for refine) parameter files via the -extra_res_cen and -extra_res_fa command lines. I used the script molfile_to_params.py to create the parameter files (both centroid and all-atom) and the ligand.pdb file.
[I made them from a .mol file using the .pdb file of the ligand using openbabel.... maybe there is a problem here..]

The final score of the design is ~700, 400 of which are the score of the ligand alone (after scoring using .../bin/score_jd2.linuxrelease -s design.pdb -database path -extra_res_fa fa.params) . Is it normal for the ligand to have a high score like this or am I doing something wrong here?

I also tried scoring the input pdb (which has the ligand.pdb generated from the script inserted), and the 'ligand residue' also had a high score by itself which was also ~400.

any help is very much appreciated.

Thank you,
a_s_a

Post Situation: 
Wed, 2011-06-15 00:03
a_s_a

High scores are usually the result of clashes. I'm guessing you have a high fa_intra_rep score for the ligand. Rosetta is almost certainly not relaxing the ligand, so any small things about the input Rosetta doesn't like will stay that way.

Does your score result have the per-residue and per-score section that looks something like what's below, which you can use to diagnose which term is causing the large score? (fa_intra_rep is the 5th column in this case).

# All scores below are weighted scores, not raw scores.
#BEGIN_POSE_ENERGIES_TABLE UBQ_E2_0001.pdb
label fa_atr fa_rep fa_sol fa_intra_rep pro_close fa_pair hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih atom_pair_constraint rama omega fa_dun p_aa_pp ref total
weights 0.8 0.44 0.65 0.004 1 0.49 0.585 1.17 1.17 1.1 0.5 2 5 5 50 0.2 0.5 0.56 0.32 1 NA
pose -820.344 239.487 403.65 2.10127 27.0666 -12.5863 -47.5282 -58.9692 -16.5835 -4.65349 0 0 0 0 376.642 -3.95454 87.7306 445.125 -19.2129 -52.33 545.64
SER_p:NtermProteinFull_1 -2.37325 0.158463 1.98733 0.00286597 0 0.00173929 0 0 0 0 0 0 0 0 0 0 0.303073 4.67488 0 -0.37 4.38511
GLN_2 -2.83504 1.43855 1.52157 0.00454749 0 -0.00514894 0 0 0 0 0 0 0 0 0 -0.204038 0.0608883 4.71246 -0.122122 -0.97 3.60167
LYS_3 -2.70773 0.480922 1.32955 0.00670788 0 0.00688822 0 0 0 0 0 0 0 0 0 -0.14769 2.02297e-05 3.46405 -0.041184 -0.65 1.74154
ALA_4 -3.03849 0.119461 1.42765 0.000817063 0 0 0 0 0 0 0 0 0 0 0 -0.128825 0.000990234 0 -0.149645 0.16 -1.60804

Wed, 2011-06-15 07:29
smlewis

hi smlewis,

Actually it is fa_rep term that is high (please see below). In this case I am simply trying to score green fluorescent protein (1EMA.pdb). LG1 score is 245 while the whole structure score is 643.

I attached the .mol2 file for the chromophore (generated using openbabel) and the .params file (generated using molfile_to_params.py).
I also uploaded the input pdb for scoring (which includes LG1 instead of the CRO; coordinated of which are from the pdb file generated with LG.params)

Thanks a lot for your help

# All scores below are weighted scores, not raw scores.
#BEGIN_POSE_ENERGIES_TABLE 1EM1_0001.pdb
label fa_atr fa_rep fa_sol fa_intra_rep pro_close fa_pair hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih rama omega fa_dun p_aa_pp ref linear_chainbreak overlap_chainbreak total
weights 0.8 0.44 0.65 0.004 1 0.49 0.585 1.17 1.17 1.1 0.5 2 5 5 0.2 0.5 0.56 0.32 1 1.33333 1 NA
pose -863.018 789.902 433.068 2.50203 61.6793 -20.4745 -17.9809 -145.897 -16.7595 -18.4937 0 0 0 0 2.35112 41.8995 458.407 -19.677 -43.73 0 0 643.778
SER_p:NtermProteinFull_1 -1.41494 0.134777 1.2167 0.00901941 0 -0.0354852 0 0 0 0 0 0 0 0 0 0.0078096 1.24065 0 -0.37 0 0 0.78854
LYS_2 -3.49071 4.74493 3.08823 0.00542008 0 -0.484169 0 0 -0.551881 -0.253388 0 0 0 0 0.972774 0.120974 9.20326 -0.058382 -0.65 0 0 12.6471
.
.
GLY_224 -1.0332 0.148119 0.513931 1.58146e-05 0 0 0 0 0 0 0 0 0 0 -0.355812 0.00998178 0 -0.803254 -0.17 0 0 -1.69022
ILE_p:CtermProteinFull_225 -3.25505 15.5117 1.38385 0.018914 0 0 0 0 -0.213718 0 0 0 0 0 0 0 4.90175 0 0.24 0 0 18.5875
LG1_226 -13.0798 250.61 8.83978 0.028532 0 0 0 0 0 -0.996807 0 0 0 0 0 0 0 0 0 0 0 245.402
#END_POSE_ENERGIES_TABLE 1EMA_CRO_to_LG1_0001.pdb

Wed, 2011-06-15 13:04
a_s_a

From the params file, I see you're modeling the chromophore as an independent ligand, rather than a non-canonical amino acid. Therefore, you're likely to get phenomenally high VDW repulsive energies where the atoms of the chromophore and the atoms of the protein backbone make covalent bonds with each other (because Rosetta isn't seeing them as covalent bonds).

BTW, I notice you don't have any hydrogens on your chromphore. Rosetta is parametrized off of full atom (with hydrogen) structures. The results you get aren't going to be as good if you protonate the mol2 file prior to molfile_to_params, especially when you're missing polar hydrogens, (This doesn't have anything to do with your fa_rep issue, though.)

Thu, 2011-06-16 08:22
rmoretti

Hi rmoretti,

That makes perfect sense. I have no idea how to model the chromophore as a non-canonical amino acid though. Can I please ask how I can do this?

I appreciate your help

Thu, 2011-06-16 16:16
a_s_a

Briefly, you need to run molfile_to_params in the same fashion you already have, except your input needs to be the whole residue (N/CA/C/O of backbone, sidechain, and then ligand).

This thread covers a lot more of the details, including a link to a tarball with patches to the code you'll need: http://www.rosettacommons.org/content/some-application-troubles-when-usi...

Fri, 2011-06-17 13:06
smlewis

Thanks smslewis and rmorreti for your help and getting me that far already. I already had a look at the thread you mentioned above yesterday after rmorreti's reply.

I am facing a problem though, as in my case I am trying to generate a params file for the chromophore in GFP (1EMA.pdb) and the carbonyl carbon and the "alpha carbon" are not linked but in between there is C-N-C so the "backbone" looks like this N-C-C-N-C-C(O). I noted in the readme file in the tarball with the code to generate a polymer type residue this line: "Note: You will need to add some additional code to work with your additional backbone atom." I have no idea where to start.

Also the chromophore doesn't have a "sidechain" but there are several "sidechains coming out of the backbone" which I anticipate will be another problem I will run into.

I would very much appreciate your help

Fri, 2011-06-17 15:52
a_s_a

Noncanonical backbones are getting into some pretty heavy lifting. Rosetta can do this, but something this complicated needs hands-on developer tweaking to get it done. Release 3.2.1 can't do it; current developer trunk can't do it, although I think there's a branch that can.

If the clash is in a constant region of the protein, you can (and probably should) just ignore it. You say you are modifying a loop near the ligand. If the residues that are directly attached to the GFP fluorophore are not being moved, then their clashes can't be relaxed and thus just add a constant large score. A clash between the ligand the neighboring residues will not affect kinematic modeling of a loop that is proximal in space (so long as it's not next to the ligand in primary sequence). Your total scores will be ugly, but they'll be okay once you subtract off the large constant clash. Does this make sense (and does it apply to your case?)

Sun, 2011-06-19 11:03
smlewis

Yes you are right smlewis. They will have ugly scores but it is always there. I was having a look at my output over the past few days. I will just ignore this large value for the chromophore.

Thank you very much for all your help

Wed, 2011-06-22 10:39
a_s_a