You are here

homology modeling with end extension

20 posts / 0 new
Last post
homology modeling with end extension
#1

Hi there, I'm running the "homology modeling with end extension" to the model region between two subdomains of a protein. The first subdomain has ~100 aa and it structure has been solved by NMR; the second subdomain has ~200 aa and has been solved by X ray. The region between them has about 120 aa long accompanied by a poor SS prediction. How many aa such script can deal with?

Post Situation: 
Mon, 2012-10-08 14:19
fred

Generally speaking, ab initio works best on 100 residues or fewer. 120 is an accessible number, but there's not much guarantee that you'll be able to detect a plausible conformation (by score) from the sea of implausible conformations. I guess give it a shot and see how well it works.

I forget the details of the demo, but I think that Robetta may impose a hard limit of 400 residues for a fragments file, so you may need to generate fragments yourself rather than using Robetta.

You can also use FloppyTail for this sort of domain-assembly work. It won't be of any value without some constraints, though.

Mon, 2012-10-08 14:32
smlewis

Hi smlewis. Thanks for helping. What parameters do I have to look at to check if Rosetta has succeed well? the number of model per cluster?

Tue, 2012-10-09 09:00
fred

I always go with biophysical intuition first: does the model make sense, and is it compatible with my experimental data about the system?

Clustering is useful in that you want to see that most of the low-energy models group together in one relatively large cluster - that is the strongest signal of success you can expect without experimental verification. If all low-energy models are distinct by examination, or fall into different clusters, then none of them can be interpreted as "correct" and Rosetta more or less failed.

Tue, 2012-10-09 09:14
smlewis

Hi smlewis,Is there any problem to extend both N- and C-term at the same time? I would like to extend ~50 aa in the N-term and ~100 aa in the C-term.

Wed, 2012-10-17 10:11
fred

If you expect the two regions won't interact, you can just do it in subsequent runs.

Having never done this, it looks to me like the rigid chunk claimer file only specifies what regions AREN'T moving - therefore if you only claim the middle domain as fixed, and do the rest as the demo says, then it will try to fold both ends simultaneously. If they truly interact significantly then rosetta isn't very likely to get it right.

Wed, 2012-10-17 12:16
smlewis

Is there a rule of thumb to "evaluate" how well a Rosetta run was? In recent results, I've obtained things like 30% of the low-energy models in the most populated cluster and, also, 7 of the 10 lowest energy models in the same cluster. Can I use these founds to evaluate success? Does it make sense?

Wed, 2012-10-17 15:22
fred

It's going to vary from system to system, but I think that those numbers indicate a pretty strong result: Rosetta finds this conformation frequently and favors it, so the scorefunction thinks it's the right answer.

Wed, 2012-10-17 20:29
smlewis

I tried to fold a 27 aa N-terminus of a ~340 aa protein with both FloppyTail and "homology modeling with end extension" protocols. The "homology modeling with end extension" protocol takes about 50 min to generate one model on one core of an Intel Core2 Quad Q6600 @ 2.40GHz computer. The total score of all models are negative and easy to analyze.

The Floppytail method needs about 20 min to generate one model on one core of AMD Phenom II x6 1090T computer. All models have positive total score. The following are the flags used.

################
-s Rad57_490.pdb
-use_input_sc
-packing:repack_only
-out:file:scorefile Rad57_silent.out
-run:min_type dfpmin_armijo_nonmonotone
-flexible_start_resnum 1
-flexible_stop_resnum 27
-flexible_chain A
-short_tail_off 0
-short_tail_fraction 1.0
-C_root
-shear_on .3333333333333333333
-AnchoredDesign:perturb_show true
-AnchoredDesign:debug
-AnchoredDesign:perturb_temp 0.8
-AnchoredDesign:refine_temp 0.8
-AnchoredDesign:refine_repack_cycles 100
-AnchoredDesign:perturb_cycles 5000
-AnchoredDesign:refine_cycles 3000
-nstruct 5000
#################

The models from both methods are similar. But both methods seem quite expensive for such short tail/extension. Could anyone help me to optimize the options?

Thanks!

Wed, 2012-11-21 10:01
xpzhang

That sounds about right for FloppyTail. It's not really intended for well-folded tails, it's for sampling envelopes of possible solutions for flexible tails. You can speed FloppyTail up from command line by decreasing the cycle counts in perturb_cycles and refine_cycles, but it may negatively affect model quality. I guess it could be optimized in-code by reducing the frequency of minimization...runtime has never really been an issue for me so I haven't tried to optimize the minimization schedule.

I'm not as familiar with the other protocol but I am going to guess you can most rapidly decrease the runtime by removing the relax flags near the end of the flags list (I assume you are using the ones in the demo). Again, this will negatively affect model quality.

I assume you did compile with mode=release?

Wed, 2012-11-21 17:37
smlewis

Thank you for your answers, Steven.
To select the best model after FloppyTail, should I still choose the model with the lowest total energy (even if it is still positive)?
I am using the programs compiled with mode=release. Do both FloppyTail and Minirosetta support true parallel computing?
Thanks.

Mon, 2012-11-26 09:44
xpzhang

" To select the best model after FloppyTail, should I still choose the model with the lowest total energy (even if it is still positive)?"

In a very strict sense, yes. In all cases where I've actually used FloppyTail, I had significant experimental data to assist in interpretation of the results, so total_score was only one factor out of many in interpreting models.

"Do both FloppyTail and Minirosetta support true parallel computing?"
Both support MPI. Rosetta's Monte Carlo algorithms require many independent trajectories to reach solutions, so they only parallelize at the whole-trajectory level: each processor will handle one trajectory independent from all others. The only interprocess communication is to arrange which processor does which job. You cannot accelerate the speed for a single result this way, it's still 30 min/structure/processor, but you can get many structures per minute with many processors.

Mon, 2012-11-26 12:28
smlewis

Thank you.
"Both support MPI..."
I have never tried MPI. I built MPI version of the excutables and tried FloppyTail.mpi just now with three cores. I used "jd2::mpi_work_partition_job_distributor". As discussed previously for AbinitioRelax (http://www.rosettacommons.org/node/2292), pdb files were overwritten (n pdbs were generated), the silent file contains three copies of each structural entry (3xn lines total). "-run:protocol broker" does not help. What are the correct flags for the MPI run? I used MPICH2. Thanks!

Fri, 2012-11-30 11:22
xpzhang

My mpi command lines look like this:

mpirun -n # /home/smlewis/rosetta/build/src/release/linux/2.6/64/x86/gcc/4.1/mpi/rosetta_scripts.mpi.linuxgccrelease @options

You prepend with mpirun, or the appropriate mpi shell for your MPI installation. You also have to be careful to use the mpi-built rosetta (that you built by adding extras=mpi to the scons command line). Other than that, it's automatic, Rosetta automatically detects MPI and runs accordingly.

The flag jd2::mpi_work_partition_job_distributor changes how jobs are distributed but won't have any affect if MPI isn't running (as it appears to not be running here). It's a good flag to use with only three processors, though.

Did you get one log file or three? What does it/they look like? They get a processor ID added to the front part of the line if they're running in MPI:

MPI:
core.init: (99) 'RNG device' seed mode, using '/dev/urandom', seed=1935596790 seed_offset=0 real_seed=551966022

not MPI:
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=1935596790 seed_offset=0 real_seed=551966022

There isn't much in the Rosetta flags file that interacts with MPI; it's enabled just by compiling it that way and using mpirun. Usually the problem of not-running-in-MPI that you have is a problem in compilation or in the MPI command line. If you paste your MPI command line in I guess I can see if I spot any errors.

Fri, 2012-11-30 13:00
smlewis

This is the command line I used:
mpirun -n 3 /home/zhang/local_programs/rosetta3.4_mpi/rosetta_source/bin/FloppyTail.mpi.linuxgccrelease @options
I did not get any log file.
I checked the program sizes compiled by "scons bin mode=release" and "scons bin mode=release extras=mpi",
They are different (346,971 and 348,795). Compilation was smooth and no error found. Why it does not run?
Is that because the mpi compatibility? I use MIPCH2 (mpich2-1.4.1p1, Argonne National Laboratory, http://phase.hpcc.jp/mirrors/mpi/mpich2/). Shall I try openmpi (openmpi-1.6.3.tar.gz, http://www.open-mpi.org/software/ompi/v1.6/)?
Thank you.

Fri, 2012-11-30 16:22
xpzhang

I've used MPI on two systems. One has installed:
libopenmpi1.5-2
libopenmpi1.5-dev
openmpi1.5-bin
openmpi1.5-common

The other uses a module:
mvapich_gcc/4.1.2

I would guess we should be compatible with mpich2, but I'm not sure.

Can you post the first few lines of whatever log file Rosetta gets you? It's somewhat diagnostic of whether MPI is running.

You can also re-do your run with "-constant_seed -jran 1939394" on. If all three processors produce the SAME line in the scorefile, it's not MPI at all for some reason (they used the same seed 1939394, instead of incrementing by the processor ID as MPI will). If they produce different results, something else is wrong.

Fri, 2012-11-30 18:40
smlewis

Thank you.
Adding the "-constant_seed -jran 1939394" flag in the option file gives the following error (three lines)
ERROR: Multiple values specified for option -run:constant_seed
ERROR: Multiple values specified for option -run:constant_seed
ERROR: Multiple values specified for option -run:constant_seed
I attach the score file and screen output from the run without the above flag. Hopefully, you can figure out the problem.

Mon, 2012-12-03 10:32
xpzhang

Either you have two constant_seeds in your option file, or more likely you put "-constant_seed -jran 1939394" all on one line. They're technically two options ("-constant_seed" and "-jran 1939394") so you want to put them on separate lines in an option file. (Though all one line on the command line.)

If neither of those apply, you'll need to show us your option file.

Tue, 2012-12-04 09:32
rmoretti

The output implies that the MPI build is running, because it has the (0) inserted into the tracer string...but all three processors think they are processor 0, so there is no actual MPI communication.

What happens if you run FloppyTail.mpi.linuxgccrelease without mpirun? It should fail immediately with an MPI related error message.

Is there some sort of MPI test job that your sysadmin can suggest to you to test that your MPI environment works right?

Tue, 2012-12-04 10:25
smlewis

Thanks to both of you.
My system (fedora 13) has openmpi and mpich2 from fedora release installed. I installed MPICH2 again from source code. Compiling (with extras=mpi) was OK. I followed the prompt (from mpirun -n 3 ..) to start mpi using "mpd &", that given me the errors.

I removed all three packages and reinstalled openmpi and mpich2 from fedora. I used "module load openmpi-i386" to start mpi and compiled rosetta with extras=mpi. The openmpi library files were linked (mpirun can not find them). Finally, "mpirun -n 3 Floppytail.mpi.linuxgccrelease... " gives correct result, I think.
Thank you very much for your help.

Fri, 2012-12-07 18:06
xpzhang