You are here

FloppyTail bugs for N-terminal tails

19 posts / 0 new
Last post
FloppyTail bugs for N-terminal tails

Hi all. Recently I've been using the FloppyTail application to try and model the N-terminal floppy region of my protein. I ran into (and solved) two issues, one that I'm sure is a genuine bug, and another that may be known but is nonetheless very serious. I am running rosetta 3.3 on a machine running MacOS server 10.5. All tests used the '-C_root' option, and had my protein listed first in the pdb.

The first issue involves the option '-packing:repack_only'. While this option is on in the integration demo, it is not described in the documentation and I couldn't find any information on it in any other application or context. Without this option, FloppyTail randomly changes the sequence of about 90% of the defined flexible region, as well as sporadic point mutations throughout the rest of the protein. The application also takes much, much longer to run, finishing about 17 times slower. I would assume this behaviour doesn't happen with a C-terminus tail, but haven't tested it. With the option turned on, the application runs just fine and outputs expected results.

The second issue is with the numbering of the flexible region, with '-flexible_start_resnum' and '-flexible_stop_resnum'. My region is about the first 40 amino acids or so, so I had the start and stop as 1 and 40, respectively. This worked fine when I was modelling the protein on its own. But my protein has a docking partner that the N-tail may be interacting with, and when I included this docking partner into the pdb things got strange. The docking partner was being held still, as was the first amino acid of the tail, but the rest of my protein was moving and flying around that first amino acid. This is obviously an issue, as it completely destroys the docking conformation between my two proteins. Also, the tail was arbitrarily placed, so I certainly don't want it constrained by that starting position. This behaviour disappeared when I changed the start and stop numbers to 0 and 40; both proteins stayed in their pre-defined docking conformation, and the tail was the only thing moving around.

These two things took me a couple days to solve, and I wasn't able to find any other previous cases or information on them. I wanted to post so hopefully someone else having those issues could find a fix, and so the developer could see these and either fix or add them to the documentation.

Post Situation: 
Thu, 2012-03-01 13:06
Dave C

I also had one suggestion/feature request. I use several different Rosetta applications for my work, and generally run and manage them via Python scripts. The biggest reason is to make use of this machine's 8 processors. Running several versions of the docking application isn't an issue as each process creates an empty file 'A_B_docking_0001.inprogress' while working on that particular decoy. However FloppyTail doesn't do this, and running several separate processes results in each working on decoy #3 at the same time, and overwriting eachother so that only 1 remains. I used a workaround where each process is run off of a differently named starting pdb, so that there is no overwriting or conflicts. Would it be possible for the process to create an empty placeholder file when working on a decoy so that a second version moves on to decoy #4 instead of also generating decoy#3? Just a thought that would simplify parallelization.

Thu, 2012-03-01 13:15
Dave C

A) repack_only: this is a general packing flag that should be used through most of Rosetta. Many applications are silently forcing you to use repack_only without telling you about it (it's a little more complex than that), but FloppyTail refuses to make that assumption, so you have to remember to tell it. I can update the documentation to reflect that. When FloppyTail was originally written, it forced this flag; someone then wanted design, so I turned the forcing off, so now if you don't want design you do need the flag. (Design behaves the way you describe by default - that's just how design is.) Design is what's causing your slowdown here. Your memory usage is through the roof, too, I bet.

C) in_progress: Are you using MPI or not? MPI is the preferred method for parallelization. If you aren't using MPI try the flag run::multiple_processes_writing_to_one_directory, which will get you the docking-type in_progress behavior. In most situations, the in_progress system is a terrible thing to do, but it is a reasonable solution for certain cluster environments (including the docking team's). (With only 8 procs, use the -mpi_work_partition_job_distributor flag to not lose one processor as a head node; try to schedule a number of jobs evenly divisible by a number of processors).

B) To get FloppyTail to work with multiple chains, you have to carefully consider the order of the chains in the PDB versus where the flexible region is. Anything upstream of the tail will move, anything downstream will not (unless you use C_root, which reverses that.) Moving the start from 1 to 0 triggered the C_root option, which leaves your N-terminal tail at a loose end of the fold tree. The 1-40 selection left residue 1 rigid, I guess. I'll try to fiddle with the documentation to express inclusivity/exclusivity in the options. I don't know how to write foolproof code to generate the right fold tree from any input, but I do know how to help you rearrange your input to get the fold tree generated the way you want.

D) Ask questions faster for FloppyTail. I wrote it and I do most of the message board support so you can get answers more quickly.

Thu, 2012-03-01 19:37

Thanks for the quick responses. That makes sense about the repack flag, I just couldn't find any information on it and so was rather confused with the behaviour. As for B), when I saw the first amino acid being held constant I assumed it was because computers count from 0 and pdbs count from 1, but wanted to mention it in case you weren't aware (it seems most people use FloppyTail for C-terminal tails and this is only evident if you have an N-terminal tail in a pdb with multiple chains, so I didn't know if it had ever come up before). I am not using MPI, honestly because i wasn't aware of it, but have been achieving similar functionality with my own scripts (which are already written and working, so I'm hesitant to change all of that).
Thanks for writing and making available FloppyTail, and for the community involvement here; it really makes using software like Rosetta a much better experience.

Fri, 2012-03-02 11:39
Dave C

FloppyTail was supposed to be a one-shot app for the paper for which it was written - I had no idea it would be so popular when it was written. It is (decreasingly) plagued with assumptions about that original paper (including a C-terminal tail) that are slowly being fixed as I get suggestions and collaborators' requests.

You're welcome!

Sat, 2012-03-03 18:50


I have another two questions for modeling one protein`s N-ter in complex consisting of two partners.

A) Following the instruction in rosetta3.4, I commented out some unnecessary lines in, while setting options as follows. (also, adding C_root and changing N-ter as first part in pdb) After checking the result, I found N-ter could fold as expected and other parts retained previous conformation, but the problem was those two compacted partners in input were separated so far. I attached two figures showing them, before and after floppytail simulation.

-in:file:s before.pdb
-out:file:scorefile floppytail_silent.out
-flexible_chain A
-flexible_start_resnum 1
-flexible_stop_resnum 46
-AnchoredDesign:perturb_temp 0.8
-AnchoredDesign:refine_temp 0.8
-AnchoredDesign:refine_repack_cycles 20
-AnchoredDesign:perturb_cycles 500
-AnchoredDesign:refine_cycles 200
-nstruct 1

B) from rosetta3.3, it is possible to simulate multiple linkers together. So could I model the N-ter and one linker together? Do I need to add C_root if that linker is near C-ter of the same protein.

Thanks for any reply.

Wed, 2012-04-25 06:45

A) FloppyTail is very sensitive to how your fold tree is set up. If I understand correctly, you have a complex of two chains at the start and a non-connecting tail between them. After FloppyTail, the tail is folded but the partner interface is broken. If this is the problem, what you need to do is move the complex units around in the input PDB so that the flexible region does not occur _between_ the two chains - either have the flexible region first, and use C_root, or have it last, and do not.

B) You never *need* C-root, it's a question of efficiency. You want the root of the fold tree to be in the largest internally rigid portion of structure; this makes the code run much faster. If you want multiple flexible regions, use the movemap input options in stead of -flexible_chain/start/stop. I'm not sure what the question was here.

Wed, 2012-04-25 07:22

Thanks for your quick reply. I am sorry that I do not propose my questions clearly.

A) actually the input pdb was arranged as`(FFF...FFFFAAAA....AAAAA)BBBB.....BBBBBB`. (FFF...FFFFAAAA....AAAAA) is protein A, and flexible N-ter of it FFF...FFFF was set to be first.BBBB.....BBBBBB is protein B. And in the input model, protein A and B compacted well. But after modeling, they were broken and far from each other in the space shown as `after.png` in my last comment. So I am not sure where the problem is.

B) another question is if there is linker in protein A, like (FFF...FFFFAAAA....LLLLLLLLAA)BBBB.....BBBBBB. L means linker. And I want to model both FFF...FFFF AND LLLLLLLL. From your explanation above, I guess C-root is unnecessary because of the linker. When I tried this multiple modeling supplemented with movemap, it stopped at

ERROR: Error reading movemap at line: 1 46 BBCHI
ERROR:: Exit from: src/core/kinematics/ line: 499`

movemap is
1 46 BBCHI
139 146 BBCHI

I tried for many times, but couldn`t find reasons for them yet. Could you give me any hints? Thanks!

Wed, 2012-04-25 20:24

A) For the PDB setup you've described, using C_root is definitely the right idea. You want the moving region to be as far as possible from the tail.

B) You will still want C-root in this case to keep B in place. If your protein is like this:

Is the A-B interface in 1 or 2? If it's in 2, then C-root and this movemap will work great. If it's in 1, then we'll need to hack FloppyTail a bit to get it to work. If it's both 1 and 2, then LLLLLLLLLL is a loop, not a linker, and you need something else entirely (I already have an unreleased floppytail+loop hack you can use).

The MoveMap file format, taken from the MoveMap C++ file:
/// @brief reads lines of format and set movemap
/// RESIDUE * CHI # set all residues chi movable
/// RESIDUE 36 48 BBCHI # set res 36-48 bb & chi movable
/// RESIDUE 89 NO # set res 89 unmovable
/// JUMP * NO # set all jumps unmovable
/// JUMP 1 YES # set jump 1 movable
/// If a residue/default is not specified, mm defaults to current value.
/// If a value for a jum is not given (e.g. "JUMP 4\n"), it defaults to movable (YES)
/// Setting 'CHI' implies BB not movable, thus don't do:
/// Instead:

So, I think what you need is to add RESIDUE:

Thu, 2012-04-26 06:39

Thanks so much for your advice.

For B), it is exactly the third case, the interfaces between A and B are both 1 and 2. So after adding RESIDUE in movemap, that loop could fold well. But currently the problem is similar to A), 22222222222222 (2 belongs to protein A) is leaving from previous position.

So to put it simple, the trouble in my modeling now is I want to model FFFFFFFF and LLLLLLLLLL of A in complex A-B. Following your advice, both of them could fold now, but I do NOT want other parts in complex A-B leaving their input position. So I am not sure why A and B separate from each other after modeling, that is, interfaces (1 and 2) between A and B disappear. Is it as a result from my input pdb or something else like my rosetta3.4 setting?

Fri, 2012-04-27 05:45

If your LLLLL region occurs between two regions you do not want moving relative to one another, then LLLL is a loop, not a flexible region for floppytail. FloppyTail sets up FoldTrees that allow changes to propagate; loop modeling sets up fold trees that prevent changes from propagating. The modeling isn't working because you're asking FloppyTail to do something it doesn't do.

I already have developed a hack to FloppyTail that allows use of loops with flexible regions - so FFFFFF is a "floppy tail" while LLLLL is simultaneously a loop. I'm willing to let you use it, but be warned that it's not really ready for public release. I wrote it for a coworker but she hasn't used it yet to my knowledge, so I don't know if it's got bugs to shake out. You'll probably have to fiddle with it a bit to get it to compile against 3.4 by removing uses of devel/init and devel::init and replacing with protocols/init/init and protocols::init::init (notice two inits). Put the attached file next to FloppyTail (src/apps/public/scenarios) and add it to src/apps.src.settings (next to FloppyTail, again). If it won't compile (and it probably won't) let me know what error it spits and I'll tell you how to fix it.

If your PDB is set up FFFFFFFFFFF111111111111111LLLLLLLLLL22222222222BBBBBBB, then you'll want to use C_root. Don't put the loop into the MoveMap style input, use the flags input for locating the tail instead. Pass a normal loops file by -loop_file as for loop modeling. This does ONLY KIC modeling so you won't need fragments like for CCD.

Fri, 2012-04-27 06:33

Thanks for your generous sharing.

There is no error report message during its running.

But 1) A and B are still separated,
2) 2222222222 (which belongs to protein A)could be fixed ONLY relative to A, but LLLLLLLLLLLLLLLL could NOT fold now.

please see the attached figure.

proteinB: blue

Fri, 2012-04-27 07:57

Can you show me your command line...? I'm kind of surprised it compiled without errors. Did you remember to use FloppyTail_loop instead of FloppyTail...?

Fri, 2012-04-27 07:58

Sorry! I really forgot to use FloppyTail_loop.

It could compile well now, and the problem is still like last comment said.

The command line is

/home/rosetta/rosetta_source/bin/FloppyTail_loop.linuxgccrelease -database /home/rosetta/rosetta_database @options

options are

-in:file:s revise.pdb
-out:file:scorefile floppytail_silent.out
-loops:loop_file revise.loop_file
-loops::input_pdb revise.pdb


-flexible_chain A
-flexible_start_resnum 1
-flexible_stop_resnum 46
-shear_on .33333333333333333333

-AnchoredDesign:perturb_temp 0.8
-AnchoredDesign:refine_temp 0.8
-AnchoredDesign:refine_repack_cycles 20

-AnchoredDesign:perturb_cycles 500
-AnchoredDesign:refine_cycles 200
-nstruct 1

Fri, 2012-04-27 08:32

I have an updating information.

After aligning several outputs models with input one, I found the loop fragments could have slight folding and conformation changes.

So now the single problem left is how to keep protein A and B in original complex status, not separating.

Fri, 2012-04-27 09:13

I feel like this should be working by now. Can you post the inputs (revise.pdb and the loops file) for me to take a deeper look at it? I wrote floppytail so if there's a bug I want to track it down.

Fri, 2012-04-27 19:33

The loop file is quite simple

LOOP 139 150

and I think actually this loop modeling is working now as my last comment said, although I am not sure whether I need to add `0 0` for ` Cut point residue number` and `Skip rate` as the test file in loop modeling.

So the only left one is about the pdb, but here it said I could not be allowed to attach the pdb file....So I change the file with jpg ending, when you open it, please change it into .pdb again.

Thanks so much.

Fri, 2012-04-27 20:02

Yeah, the problem is that the Jump between chains goes from 1 to 163, not 162 to 163. I wrote a hack for you. The first and last two lines are for context so you'll know where in the file to insert the new lines. Put it in FloppyTail_loop and give it a go.

Sat, 2012-04-28 13:01

Following your careful guidance, it could work well finally.

Thanks so much.

Sun, 2012-04-29 07:25