# Efficient local energy computation

11 posts / 0 new
Efficient local energy computation
#1

Dear experts,

I need the relative energy of several thousands of conformations of the same 10-residues loop including its static environment. However, I've only found the way to get the absolute energy of the complete structure. This seems a waste of time since most of the inter-residue energy terms are just the same. Is it possible to efficiently compute the energy of a small loop while taking into account its environment in PyRosetta? How?

Thanks a lot!
- Mon

Category:
Post Situation:
Mon, 2014-11-17 08:57
Mon

Rosetta is already very efficient at not recalculating those residues/atoms/energies that do not need to be recalculated. It uses a few methods to do this, but generally it works very well. So scoring a 10 residue loop is fine - Rosetta will not attempte recalculation where it does not need it and will instead use those internal or residue-pair energies it already has.

Mon, 2014-11-17 09:50

Hi,

Sorry, I'm newbie in PyRosetta. I'm repeatedly moving just a small loop of the structure by directly using PyRosetta's set_phi, set_psi, set_bond_angle, and set_bond_length functions (see code below). Note that the new Internal Coordinates of the loop are already loop-closured by some external method, so only the loop region is perturbed. Since I'm not using any sofisticated PyRosetta method, are you sure that "scorefxn" is able to efficiently handle each energy calculation in my simple implementation? Should I use any other functions?

Thanks again!
- Mon

# --------------------------------------------------------
# MY LOOP ENERGY COMPUTATION SCRIPT:
# --------------------------------------------------------

# state-of-the-art Rosetta's energy
scorefxn = create_score_function('talaris2013')

# Screen all my loops
for l in range(0,nloops):

# Full copy of "pose" into "pose2"
pose2 = Pose()
pose2.assign(pose)

# These functions set new Internal Coordinates for the 10-residues loop backbone.
# They use PyRosetta methods: set_phi, set_psi, set_bond_angle, and set_bond_length
# "rri" is the start residue index of the loop within the whole protein.
update_dihedrals(pose2, dihedrals[l], rri)
update_valences(pose2, valences[l], rri)
update_lengths(pose2, lengths[l], rri)

# Re-packing loop side chains
loop_packer(pose2,scorefxn,resfile_name)

# Computing Rosetta Energy (full protein)
fullE = scorefxn(pose2)

Tue, 2014-11-18 06:11
Mon

If you to try to limit score calculations, you can look into using the eval_ci_1b(), eval_cd_1b(), eval_intrares_energy(), eval_ci_2b(), and eval_cd_2b() methods of ScoreFunction. (eval_intrares_energy() sums the intraresidue contribution of the two body energies) There's even bb-bb bb-sc and sc-sc versions of the twobody methods, if your protocol can be split out that far.

But in general, I would avoid trying that unless you can clearly demonstrate that it's the rescoring which is reducing your speed. (Note that scoring may induce internal<->cartesian coordinate updating, which can take a not-insignificant amount of time itself, but is needed independent of how you get the scores.) Off hand, I'd guess that the overhead of doing the loop over residues in Python will probably remove any time savings you'd get from only calculating scores over the residues of interest. (Although that might change if it's a really big protein and a really small loop.)

Also keep in mind the FoldTree of your pose - unless the FoldTree is set up appropriately, the coordinate changes involved with updating bond lengths, angles and dihedrals may propagate to other portions of your protein. (Which would affect scoring.)

Tue, 2014-11-18 08:29
rmoretti

Adding to Rocco's comments about the FoldTree - you will need the loop foldtree here probably so that changes do not propagate to other regions. That is likely why it seems scoring is taking a while. You will need to set a loop FoldTree, add cutpoint variants to your pose, and set the 'chainbreak' term to your scorefunction with a weight of at least 20 if you really don't want chainbreaks. You will probably need to use CCD as well after your purturbations to reconnect the loops.

Here are some commands for that:
fold_tree_from_loops function is exposed by default import
add_cutpoint_variants is also a function exposed by default import

Cutpoint Variants are basically virtual atoms that allow overlap. Take a look at the CCD paper for more info on the theory behind them. They measure how well the loop is closed by overlap - having the chainbreak energy term allows you to compute how well this overlap is done, and also to help close the loop through energy minimization of the structure. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323867/

Tue, 2014-11-18 08:58

Ok, so the overhead of looping over residues to evaluate all individual pairwise energy terms in Python would very likely remove any time savings.

When I update loop's internal coordinates using set_phi, set_psi, set_bond_angle, and set_bond_length functions the structure looks nice, i.e. no structural distortions are detected outside the moving loop. Then, energy calculations should be ok. Am I right?

Perhaps what is reducing my speed is the internal coordinates (IC) update. I use set_phi, set_psi, set_bond_angle, and set_bond_length functions sequentially to update all loop ICs. Are cartesian coordinates (CC) being updated every time I call each of these functions? If so, how would this be avoided?

Wed, 2014-11-19 09:00
Mon

Sorry Jadolfbr, I did not see your post (we posted at the same time!).

The new ICs already generate closed loops, they have been obtained using an external method. If the loop is closed (no chainbreaks), the energy calculations are correct. Am I right? Or should I still use a foldtree?

Thanks!

Wed, 2014-11-19 09:09
Mon

You should be fine then. But, I would add the foldtree and the chainbreak term just to double check the energy there. You may also get a speed up with the correct foldtree when you change phi/psi - because although you may be OK at the end, each time you change phi and psi you are changing the rest of the protein according to the starting foldtree. You could use the PyMol Mover to observe this. This may be what is making the computation slower if it needs to update the cartesian coordinates of more than just the loop residues. The foldtree should help you alleviate this so that downstream effects are not done. Even changing the rest of the protein by an insignificant amount will cause it to need to reupdate everything.

Wed, 2014-11-19 09:34

Assuming you're using function on the pose, and aren't trying to access the Conformation object directly, the internal<->Cartesian conversion should be optimized such that you'll only convert between the two when you need it. That is, if you're setting the dihedrals without accessing the Cartesian coordinates in between, you shouldn't trigger a refold with each assignment. It's only when you start to access the Cartesian coordinates that the refold should happen. -- Though Cartesian access can happen indirectly - for example scoring needs Cartesian values, so it will trigger a refold, as will displaying intermediate structures in a PyMol observer and the like. Even pulling Residue object out of the pose will trigger a refold, as the Residue object needs updated coordinates.

Regarding the FoldTree, it doesn't really matter if you know externally that the coordinates should move back to where they should be - Rosetta isn't aware of this, and will mark the downstream coordinates as "dirty" and recalculate them all, even if the recalculated coordinates are in the same location. If you set up the FoldTree appropriately you avoid this, as Rosetta knows that any coordinate changes won't propagate outside the loop, and can avoid recalculating coordinates for portions of the protein which aren't moving.

Thu, 2014-11-20 10:24
rmoretti

Hi, following your indications, and after checking the loop modeling workshop (http://graylab.jhu.edu/pyrosetta/downloads/documentation/Workshop8_PyRos...), I've added the following code to set up a FoldTree for a 12 residues long loop of some 448 residues protein.

ft = FoldTree()
print ft
ft.check_fold_tree()
pose2.fold_tree(ft)

# Note: "rri" and "rrf" are the rosetta residue indices for initial and final loop residues, i.e. the anchors.

Unfortunatelly, PyRosetta dumps this error:

FOLD_TREE EDGE 1 356 -1 EDGE 356 367 1 EDGE 367 448 -1
core.kinematics.FoldTree: FOLD_TREE EDGE 1 356 -1 EDGE 356 367 1 EDGE 367 448 -1
core.kinematics.FoldTree:
Segmentation fault (core dumped)

The loop scheme is very simple:
-Region 1: from 1 to 356 (Static)
-Region 2: from 357 to 366 (Mobile)
-Region 3: from 367 to 448 (Static)

What am I doing wrong?

Thanks a lot!

Wed, 2014-12-10 08:28
Mon

Your edge from 356 to 367 is a jump edge, not a "mobile" edge. This edge means that there is a rigid-body transform connecting residue 356 to 367 -- it does not specify what happens to the residues between the two.

You're missing any mention of what happens to residues 357 to 366 - they aren't in the fold tree, and that's what the "bad fold tree3" is telling you. (I agree - it could be more clear what the error is.) You need to add one or more peptide edges for your mobile region to the fold tree.

Fri, 2015-01-02 09:24
rmoretti