You are here

PyRosetta AbInitio Folding protocol

8 posts / 0 new
Last post
PyRosetta AbInitio Folding protocol
#1

The first thing I’m interested in within PyRosetta package is ''ab initio'' predictions of protein structure. As I started with common __Rosetta__, I was able to get predictions of reasonable accuracy with three test cases — a headpiece of villin, ubiquitin and barstar. With AbinitioRelax module of Rosetta I got moderate RMSD from native models, and I observed some correlation RMSD x Rosetta energy. Moreover, when I put on the same plot “refined native” models, I see the “energy gap”.

After that I walked through pyRosetta tutorial trying to get similar results by making ab initio folding protocol by myself. There are no program listings, but I tried to implement accurately all recommendations that were in the tutorial. The most recent protocol that I developed (listed below) includes subsequent low-resolution 9-mer and 3-mer fragment insertions under Metropolis procedure control and then high-resolution small and shear moves with periodic sidechain repacking and energy minimization. I also applied simulated annealing regime and sometimes “ramping” of VdW energy.

But unfortunately, with this I get some low-rmsd models only in villin case (36 aa), although there is no clear correlation with the score. For bigger proteins, I don’t get close-to-native models at all, that means that my protocol is inadequate.

Could anyone please point out errors that make my program inefficient, or share your own example of ''ab initio'' folding algorithm that performs at least as good as “native” Rosetta AbinitioRelax program?

---
pdb_to_fold = "1UBI_ideal.pdb"

pdb_native = "1UBI.pdb"

frag3_file = "aa1ubi_03_05.200_v1_3"

frag9_file = "aa1ubi_09_05.200_v1_3"

initTemp = 2.0; finalTemp = 0.8

// Initial set up

p = Pose(); start = Pose(); start_c = Pose(); native = Pose(); native_c = Pose()

pose_from_pdb(p, pdb_to_fold); pose_from_pdb(native, pdb_native)
native_c.assign(native); start.assign(p)

// Scoring functions

sc_c = create_score_function('cen_std')

sc_f = create_score_function('standard')

// Packer mover

task_pack = TaskFactory.create_packer_task(start)
task_pack.restrict_to_repacking()

pack = PackRotamersMover(sc_f, task_pack)

// Centroid/Fullatom conversion

switch_c = SwitchResidueTypeSetMover('centroid')

switch_f = SwitchResidueTypeSetMover('fa_standard')

switch_c.apply(p); switch_c.apply(native_c); start_c.assign(p)

// Fragment movers

movemap = MoveMap()

movemap.set_bb(True)

fragset9 = ConstantLengthFragSet(9, frag9_file)

fragset3 = ConstantLengthFragSet(3, frag3_file)

mover_9mer = ClassicFragmentMover(fragset9, movemap)

mover_3mer = ClassicFragmentMover(fragset3, movemap)

// Small & shear movers

smallmover = SmallMover(movemap, initTemp, 3)

shearmover = ShearMover(movemap, initTemp, 3)

small_random = RandomMover()

small_random.add_mover(smallmover)

small_random.add_mover(shearmover)

// Minnimizer mover

min = MinMover(movemap, sc_f, 'dfpmin', 0.5, True)

def frag_insert(pose, scoreFunction, frag_mover):

N1 = 20

N2 = 300

mc_c.reset(pose)

kT = initTemp

gamma = math.pow(finalTemp / initTemp, 1.0 / (N1 * N2))

for i in range(1, N1 + 1):

mc_c.recover_low(pose)

print "Low-resolution energy", scoreFunction(pose)

for j in range(1, N2 + 1):

kT = kT * gamma

mc_c.set_temperature(kT)

frag_mover.apply(pose)

mc_c.boltzmann(pose)

//END for j

// END for i

mc_c.recover_low(pose)

return(pose)

// END frag_insert

def small_moves_centroid(pose, mc, scoreFunction):

mc.reset(pose)

mc.set_temperature(finalTemp)

for i in range(1, 10000):

small_random.apply(pose)

mc.boltzmann(pose)

if (i % 1000 == 0): mc.recover_low(pose)

// END for i

mc.recover_low(pose)

return pose

// END small_moves_centroid

// Job parallelization & main algorythm

jd = PyJobDistributor("ubi", 5000, sc_f)

jd.native_pose = native

while (jd.job_complete == False):

// Low-resolution modeling

p.assign(start_c)

mc_c = MonteCarlo(p, sc_c, initTemp)

frag_insert(p, sc_c, mover_9mer)

frag_insert(p, sc_c, mover_3mer)

// High-resolution modeling

switch_f.apply(p)

pack.apply(p)

min.apply(p)

kT = initTemp

gamma = math.pow(finalTemp / initTemp, 1.0 / 10000)

mc_f = MonteCarlo(p, sc_f, kT)

for i in range(1, 10000):

kT = kT * gamma

mc_f.set_temperature(kT)

small_random.apply(p)

mc_f.boltzmann(p)

if (i % 1000 == 0): mc_f.recover_low(p)

if (i % 100 == 0):

pack.apply(p)

min.apply(p)

// END if i

// END for i

mc_f.recover_low(p)

jd.output_decoy(p)

// END while (jd)

Tue, 2009-12-15 11:09
batch2k

What is your starting structure? 5000 decoys seems small for folding a 36aa protein, did you try increasing the number of decoys significantly to 20,000 or 50,000?

Mon, 2010-01-04 12:38
sid

I use extended conformation to avoid any bias. Of cause 5000 may be not enough, but this is certainly not the main fault -- the classic "AbinitioRelax" program from Rosetta 3.1 package uses only 1000 decoys in my case, and results are apparently better.
So it seems that the protocol listed above is for some reason not efficient, do you have any clues?

Mon, 2010-01-11 01:13
batch2k

Hey Batch. I know this is a little late, but I went my own way for a while trying to develop an abinitio folder, and had some of the same luck. While I did some interesting things, I was told to stick with what works; and so I am trying to design the abinitio program found in rosetta, within pyrosetta.

Sounds easier then it is, but I have made some strides. This is the paper that describes the low res, abinitio program (as used in all CASP experiments from 04 to present):

Rohl, et all; Protein Structure Prediction using Rosetta, methods in enzymology vol 383, 2004.

I have been able to implement every step correctly except the Gunn approach to fragment insertion (last step). I may be able to program it, but I am doubtful. As soon as I am done, I will post it here.

The other paper that further defines how Rosetta is used within the CASP experiments is as follows:

Bradley, et al; 'Toward High-Resolution de Novo Structure Prediction for Small Proteins', Science 2005, vol 309 pgs 1868 - 1871

This is a little stranger, as it uses homologies in a unique way, but I am fairly sure this was done in CASP8.

Wish you well.

-Jared

Tue, 2010-02-23 18:46
jadolfbr

Thanks Jared -- I'd like to see your algorythm once it starts working :-) Actually, I read Bradley paper and my program was similar to that I found there, except for realistic results :-( I might have been doing something wrong, so it'll be interesting to see what you've got.

Wed, 2010-02-24 06:33
batch2k

The topic is very useful.

Jared, so excellent! I think many PyRosetta users will be interested your programs.

Sid, here is my suggestion: since PyRosetta provides more interactive interface for performing most of Rosetta tasks/protocols, would you write some scripts to re-implement the protocols referred in the Baker's famous papers as listed above? I think many users would love the operation manners in PyRosetta if they know exactly how to transit from the traditional rosetta command line options.

cheers.

-Jarod

Sun, 2010-03-21 00:15
jarod

Hello everyone,

I'm newbie in the field of Pyrosetta and I've already starting using the script written by batch2k.

However, even increasing the number of cycles the script does not reach the suitable target conformation for a experimentally known protein structure.

I was wondering if finally the script with newer implementation Jared was pointing out previously. Is it accesible from elsewhere ?

Thanks in advance!

Thu, 2015-02-05 08:58
jseco

Hi Jseco,

I never ended up finishing that script. I basically decided to use vanilla Rosetta abinitio, which worked pretty well in a few predictions after that.

However,
It looks as though a lot of the code that didn't seem to be accessible in PyRosetta, is now. You'll have to figure it out, but try this in ipython to look around:

from rosetta import *
rosetta.init()
import rosetta.protocols.abinitio as ab

It looks like the full c++ level application simply calls AbRelaxApplication class, which reads everything from the command line. So thats awesome If it works, you would be able run the full protocol in python - but you will need to control pretty much everything through the options system. (rosetta.init("-my string -of options"). In addition, its unclear to me how to get a pose out after the fold through the run() function, which is what is called at the app level.

The run function calls setup(), then creates a new empty pose and empty protocol class, then calls setup_fold(pose, protocol), and fold(pose, protocol). Really not sure if its worth it to get this to work in PyRosetta.

What you probably want to try to use is the ClassicAbinitio class. This accepts fragsets, movemaps, and a pose (I assume its an extended pose, but I really don't know.) There is also currently a lot of work going into the Environment/Topology Broker framework. I believe that the paper for this should come out soon:https://www.rosettacommons.org/docs/latest/EnvironmentFramework.html

Here is a nice description from the ClassicAbinitio code:

//@ brief The Classic Abinitio protocol from rosetta++
/*!
@ detail
general usage:
ClassicAbinitio abinitio;
abinitio.init( pose );
...
while(nstruct) {
abinitio.apply( pose );
}

call ClassicAbinitio::register_options() before core::init::init to add relevant options to the applications help

, with the following
stages, all of which uses a different ScoreFunction based on the cen_std.wts in minirosetta_database:

- Stage 1: large (usually 9mer) randomly selected fragment insertions, only VDW term turned on.
Uses score0.wts_patch and runs for either a maximum of 2000 cycles or until all moveable phi/psi values
have been changed.

- Stage 2: large randomly selected fragment insertions, more score terms turned on. Uses score1.wts_patch
and runs for 2000 cycles.

- Stage 3: uses large randomly selected fragment insertions, although the size of the fragment insertions
is tunable via the set_apply_large_frags( bool ) method. Alternates between score2.wts_patch and score5.wts_patch,
running tunable numbers of 2000-cycle iterations between the two scoring functions.

- Stage 4: uses small (usually 3mer) fragment insertions with the fragment selection based on the Gunn cost for
finding local fragment moves. Runs for 4000-cycles and uses score3.wts_patch.

The class implements the basic abinito approach as known from rosetta++. We tried to set this up, such that
behaviour of the protocol can be changed in many different ways ( see, e.g., FoldConstraints ). To be able to change the
behaviour of the protocol easily the class-apply function and methods called therein (e.g., prepare_XXX() / do_XXX_cycles() ) should
not directly change moves or trials. A reference to the currently used score-function should be obtained by
mc().score_function() ...

Behaviour can be changed in the following ways:

use non-classic FragmentMover --> eg. not uniformly sampled fragments, but using some weighting
--> large and small moves doesn't have to be 3mers and 9mers... use other movers...
---> or other fragets for the "convenience constructor"
use custom trial classes --> overload update_moves()

change sampling behaviour:
overload prepare_XXX() methods: these are called before the cycling for a certain stage begins
overload do_stageX_cycles() : the actual loops over trial-moves ...

change scoring functions:
overload set_default_scores()
weight-changes effective for all stages: set_score_weight()

Good luck!!!

Thu, 2015-02-05 10:32
jadolfbr