You are here

How to obtain alignment file for comparative modeling under Rosetta 3.1?

12 posts / 0 new
Last post
How to obtain alignment file for comparative modeling under Rosetta 3.1?
#1

Although Rosetta is famous for its de novo prediction protocol, it did provide comparative modeling package with the Rosetta 3.1 bundle but without any documentation. It is obvious that one should provide alignment file for making Rosetta-CM work. In its code, it seems to only accept one of the two alignment formats: grishin or general.

So my question is what is "grishin" or "general" alignment formats, and how to obtain such files? It is easy to guess that "grishin" refers to Dr. Grishin at SWMED. But his web provides so many softwares...

thanks,
len

Tue, 2010-05-25 01:29
lennylv

I solved it by myself, thanks for the open source.

For "general" alignment file format, it is unbelievable simple:

score nnn.nnn

queryID startPos seq

templID startPos seq

queryID startPos seq

templID startPos seq

--# for start another alignment

score nnn.nnn

queryID startPos seq

templID startPos seq

queryID startPos seq

templID startPos seq

I do not know which aligner generates this "general" file format. Then we need another utility to convert any other aligner's output to this format. It seems Rosetta-CM calculates identities and gaps in its own code, although I think all the aligner will provide such information with the alignment result.

Anyway, thanks to the Rosetta contributors.

len

Thu, 2010-05-27 18:39
lennylv

Hi len,

I was just reading your post. If I understand you correctly I can just use any available tool to align two sequences. Every part of those two sequences with a positive alignment can be manually entered in this alignment file. For example lets say from sequence A (seqA_) AA 2:43 can be aligned with sequence B (seqB1) 12:53 and sequence A 48:62 with sequence B 56:70. Then my file should look like this:

seqA_ 2 XXXXXXXXXXXX
seqB1 12 XXXXXXXXXXXXX
seqA_ 48 YYYYYYYYYYYYYYYY
seqB1 56 YYYYYYYYYYYYYYYY

and now what the algorithm is basically doing is that it uses for the alignment the known crystal structure of sequence b and for the rest without alignment it does an ab initio prediction?

Thanks for you help.

Max

Sun, 2011-03-20 10:46
maxebert

Hi lennylv,

Take a look at the Meiler Lab documentation for homology modeling. It provides information on how to create alignments and thread using Rosetta. It also provides scripts to automate much of the work. Look under the homology modeling tutorial. I believe that most people use clustaw for alignments.

http://www.meilerlab.org/index.php/jobs/resources

Steven C.

Mon, 2011-03-21 08:34
scombs

Hi,

i have a clustalW alignment file and i'm trying to reformat it to get the correct input for rosetta3.4, but i can't figure out what format it wants.
Could you please help? I'll attach the clustalW file. If you could please post a description on how to generate the desired input, including a sample input, that would be great.

Thanks,
Sabine

Wed, 2013-12-11 06:43
sabine

The clustal format, more-or-less like you have, is typically the correct format to use, though the details depend on how exactly you're intending to use it.

I'm not entirely sure how you're planning to use it with Rosetta3.4, but if you're intending to use the rosetta_tools/protein_tools/scripts/thread_pdb_from_alignment.py script to thread a PDB with the alignment, I think the file should work as-is. (Although you may want to edit the names of the sequences to match the names of the PDB's you're using.)

An example commandline for that threading script (using the weekly releases), with a corresponding alignment file is as follows:

python2.7 ~/Rosetta/tools/protein_tools/scripts/thread_pdb_from_alignment.py --template=2rh1A --target=1u19A --chain=A --align_format=clustal 1u19A.2rh1A.aln 2rh1A.pdb 1u19A_on_2rh1A.pdb

If you have another application you would wish to use the alignment file with, let me know the details and we can figure out the details for that one.

Wed, 2013-12-11 07:44
rmoretti

I'm actually trying to build a homology model using the minirosetta application. I'll attach my flags file and the template file as well. i have one alignment file i got to work, it's a different format, i created it according to the example in the documentation. But when i use this, i get the error message that it can't find the template pdb file (attached). Could you please take a look at that as well? (the names in the flags file are not all correct b/c i renamed the files to send them to you.) The working_aln file is the one i made according to the documentation, here it doesn't complain about a sequence length mismatch in the alignment, however, when i start the alignment with the 3 missing residues in the query sequence, i.e. the template pdb has actually 3 residues in the beginning which i just deleted to make it work, then it gives me a length mismatch error again.
Also, is there a particular naming convention between the tags in the alignment files and the file names?
Thanks!

Wed, 2013-12-11 08:02
sabine

Right, sorry about pointing you to the Clustal format. I'd forgotten that Rosetta also likes the "grishin" file format (as mentioned in the documentation for the comparative modeling application https://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d5/...). I think there's some scripts kicking around to convert between them, but I can't seem to find them at the moment.

Regarding your length mismatch, it looks like there are a number of residues in your alignment which are missing from your template PDB. There's five residues on the C-terminus and a significant chunk missing in the middle (DVPTLCDSACGHNEGSARENS). I'd recommend matching the sequence in the alignment with the sequence in the PDB. (You probably ran into the situation where the reported sequence of the PDB doesn't actually match what's structurally present. The fastas the PDB gives out are for the sequence of what was crystalized. If there's missing density, that won't show up structurally, but will show up in the fasta.) Note that deleting off the superflous residues like you did doesn't hurt anything, and can possibly help. If you're still experiencing length problems, try removing the extra C-terminal residues from the template structure and the alignment.

Wed, 2013-12-11 16:19
rmoretti

Thanks. I figured out the length mismatch, but i still don't know how to create a grishin alignment file other than actually doing it by hand. Could you please double check if you could find something? Also, i started a new thread, b/c after i solved the length mismatch and created an alignment file by hand, i got a segmentation fault without any additional info. I posted all my input files except the fragment files. Could you please take a look? I've been fighting with this for a while now and would like to get it going. Thanks for your help!
Sabine

Thu, 2013-12-12 05:22
sabine

Thanks!

Wed, 2013-12-11 08:34
sabine

I'm trying to use minirosetta, as described in my following post. Thanks for your help.

Wed, 2013-12-11 08:36
sabine

I'm also attaching a file showing the error message about no template being provided, though i clearly have one in my directory and i specified it in the input flags file. This still refers to trying to run minirosetta using the files from my previous post.
Thanks.

Wed, 2013-12-11 11:32
sabine