You are here

Fragment Picking in PyRosetta

10 posts / 0 new
Last post
Fragment Picking in PyRosetta

Dear All,


Is there a tutorial that can show how to preform fragment picking in PyRosetta? I want to generate fragments for many structures and I do not want to overwhelm the Robetta server, thus I want to preform it in-house.


Any help on how or where to start to generate fragments using PyRosetta? I found that there is a mover FragmentPicker() in  pyrosetta.rosetta.protocols.frag_picker but I do not know how to use it.


This is my code so far:

mover = pyrosetta.rosetta.protocols.frag_picker.FragmentPicker()


Post Situation: 
Thu, 2017-08-31 04:48

The protocols.frag_picker.FragmentPicker object is the core of the fragment_picker application - you can see the documentation for the command line version here:

That uses the commandline options to setup the FragmentPicker application - many of these are availible through setters on the object itself.

But in any case, be sure to call the parse_command_line() method on the FragmentPicker object after you've constructed it (but before you set any parameters with setters) to set  the default values.

Thu, 2017-08-31 11:19

Sorry rmoretti,

I went through the documentation, but I am unable to understand how to setup the FragmentPicker mover in PyRosetta. There seems to be large differences.

In PyRosetta there seems to be no method for assigning frag_sizes, n_candidates, n_frags, nor calling in the .pdb and .fasta files. Unless these methods are named differently?


My goal is to generate the 3-mer and 9-mer fragments for Abinitio (without having to do it in the Robetta Server - since I will be testing a large number of structures and I do not want to overwhelm the server). Am I using the correct mover?

Thu, 2017-08-31 12:14

That's the class you want to be using (it's not actually a Mover).

It wasn't written with the concept of being able to use it through PyRosetta, though. So much of the setup and usage is built around the concept of using the Rosetta commandline facililties. It probably is the case that a fair number of parameters you're interested in setting aren't going to be directly availible, but will need to be set through the command line options. (That is, being passed to init: see )

The actual fragment_picker application that's run from the commandline is a rather thin wrapper around the FragmentPicker object, though, so aside from options-related hassle, it should be relatively easy to convert a commandline example to a PyRosetta run. The application basically just creates a FragmentPicker object, calls the parse_command_line() method on that object, and then launches one of quota_protocol(), keep_all_protocol() or bounded_protocol(), based on which one is selected by the commandline. (bounded_protocol() being the default.)

If you do want to use PyRosetta for this (and not the commandline version of Rosetta), then you would basically do the same, relying on the setting of commandline options in init() to set most of the parameters you need for the protocol. There is some customization availible via the setters on the object, but there are other parameters which need to be set via the commandline parameters and the call to parse_command_line().

Thu, 2017-08-31 15:43

Ok, I got it working. My script:

init('-in::file::fasta structure.fasta -in::file::s structure.pdb -frags::frag_sizes 3 -frags::n_candidates 1000 -frags::n_frags 200 -frags:write_ca_coordinates')
mover = pyrosetta.rosetta.protocols.frag_picker.FragmentPicker()


Is this what is requred for Abinitio?

Do I need the .checkpoint and .psipred.ss2 files?

I will attache a sample of the 3-mer output, is this what is expected?

File attachments: 
Fri, 2017-09-01 02:21

That does indeed look like a properly formatted 3-mer fragment file, the type you would use for abinitio.

However, I can't necessarily tell you anything about the quality of the fragments.

You probably are going to want the checkpoint and psipred files. The reason for this is that the fragment selection process takes into account various information about the sequence you're builiding the fragments for, to predict the likely structure of those fragments. To do this it uses the sequence, of course, but it can also take advantage of the mutational propensities of homologous structures (encoded in the blast checkpoint file) as well as secondary structure predictions (encoded in the psipred file). Adding this information to the target sequence will result in better fragment quality, which will ultimately result in better abinitio predictions.

Mon, 2017-09-04 09:38


I have spent the last several days trying to figure out how to make PSIPRED working and I finally were able to. I managed to compile it and setup the UniProt90 database as the developers advise and managed to get the .ss2 PSIPRED secondary structure file. (Robetta uses the UniProt90 database?)


My question is how to get the checkpoint file? the Rosetta docs say it is from PSIBLAST, but when I run PSIBLAST i do not get a checkpoint file. How do I generate it? I tried searching online but I did not understand where it comes from.

Fri, 2017-09-08 09:21

Talking with someone who runs fragment picking locally it looks like the command to get the checkpoint file is something like:

blastpgp -b 0 -j 3 -h 0.001 -d ${dbname} -i ${protein}.fasta -C ${protein}.chk -Q ${protein}.ascii >& ${protein}.psipred_blast

## then

blastpgp -b 0 -j 4 -h 0.001 -d ${dbname} -i ${protein}.fasta -R ${protein}.chk -Q ${protein}.ascii >& ${protein}.psipred_blast_2

Where \${dbname} is the path to the `filtnr` database, the \${protein}.fasta is your input fasta, and the \${protein}.chk, \${protein}.ascii and \${protein}.psipred_blast are files generated by the runs files.

The version of blastpgp that they use is 2.2.18 -- I do know that there's some formatting changes between different versions of blast, so I don't know if another version will exactly work with the checkpoint format.

Fri, 2017-09-08 15:09

OK. Four questions:

1. What database to use? PSIPRED reccomends using the UniProt90 database, but it seems you mentioned NCBI's nr database, is the nr database what I should be using to generate the PSIPRED and Checkpoint files for Rosetta Abinitio?

2. The -R tag from the second command gives an error: ERROR: recovery from C toolkit checkpoint file format not supported I am not sure how to fix that. I am using BLAST+ not version 2.2.18

3. I have attached the checkpoint file I get (output from the first command), is this what is expected (correct format)? It looks different than the one in the demos.

4. The other file (.ascii) will not be used for fragment generation correct?


If I get the checkpoint file (thus able to generate fragments locally) I think I will be done with this forum topic.

File attachments: 
Fri, 2017-09-08 18:26

1. I think the NCBI's nr database is the one which is traditionally used. I'm not sure if the choice of database has been benchmarked. -- I probably *wouldn't* use the UniProt90 database, though.  As it has been filtered at 90% sequence identity, it misses some of the sequence variablility that would contribute to the sequence acceptabiliity profile.

2. The interface for BLAST has gone through some variations. If you're not using version 2.2.18, you may need to figure out what the corresponding option is for your version. 

3. Again, the interface for BLAST  has changed between versions, and this includes the checkpoint file format. You may need to adjust the options to get the format that the Rosetta fragment picker expects.

4. As far as I'm aware, the ascii file is not used.

Tue, 2017-10-10 09:16