You are here

Syntax for the alignment file in "minirosetta comparative modeling"

4 posts / 0 new
Last post
Syntax for the alignment file in "minirosetta comparative modeling"

Hi friends,
I am confused about the syntax for the alignment file for homology modeling.

Can I ask
1) If "-cm:aln_format grishin" is specified, in the ".aln" file:

(line 1) ## t288_ 1be9A_4
(line 2) # hhsearch_3 33
(line 3) scores_from_program: 0.000000 0.998400

Are the first two line really necessary?
Why "_4" is used after "1be9A"?
What is the meaning of "hhsearch_3 33"?
How to get "scores_from_program: 0.000000 0.998400"
Why "2" and "9" is used before the 4th and 5th line?
What is the benefit to use "grishin" format?

2) If "-cm:aln_format grishin" is NOT specified, in the ".aln" file, I am using:

(line 1) Target 1 EVQLVESGGG......
(line 2) Template 1 EVQLVQSGAE......

What is the "1" means between the PDB name and sequence?

And I have been told (attached):

protocols.evaluation.ChiWellRmsdEvaluatorCreator: Evaluation Creator active ...
protocols.jd2.JobDistributor: no more batches to process...
protocols.jd2.JobDistributor: 0 jobs considered, 0 jobs attempted in 0 seconds
protocols.jd2.JobDistributor: no jobs were attempted, did you forget to pass -overwrite?

I am pretty sure the name for my template PDB is the same as it shown in the .aln file. Can I ask how to solve this? Thank you.

3) Besides, I could not find "rosetta/rosetta_tests/integration/tests/threading". The only folders that start with "t" are "test_idealize" and "torsion_restricted_sampling". I would like to take a look at an example of .aln file.

Yours sincerely

comparative_model_HC.log2.19 KB
Post Situation: 
Thu, 2014-10-16 15:29

The lines starting with ## and # are part of the fileformat, but as far as I can tell, they're not actually used by Rosetta itself. I do believe they're used in other parts of the modeling pipeline, though. My understanding is that they encode information about where the alignment prediction comes from, and various scores about how good the alignment is. The reason for this is in the full automated comparative modeling pipeline, a number of different sequence alignment protocols are used, and the different predictions are combined and filtered with pre-processing scripts based on various heuristics on quality, etc. Once you get to the Rosetta modeling stage, however, you're down to a single alignment, and the various annotations don't really matter much anymore.

The 2 and 9 (as well as the two 1's in the "general" alignment format) specify the start location. This allows compact specification of leading/trailing overhangs and unaligned regions. (note the missing amino acids from the start of the aligned sequences).

It's a bit difficult to see why you're not getting any jobs attempted without knowing the contents of your option file. What flags are you using to specify your template, your input sequence and your input alignment?

Fri, 2014-10-17 09:45

Hi R Moretti,
Thank you very much.

1) I will use general format as it is much simpler.

2) The following is my options file: (I simplify the path into "/path/to/", hope it is fine for you.)

-run:protocol threading
-in:file:fasta /path/to/Target.fasta
-in:file:template_pdb /path/to/Template.pdb
-in:file:psipred_ss2 /path/to/Target.psipred_ss2
-in:file:alignment /path/to/Target_Template.aln
-loops:frag_sizes 9 3 1
-loops:frag_files /path/to/aaTarget09_05.200_v1_3 /path/to/aaTarget03_05.200_v1_3 none
-loops:remodel quick_ccd
-loops:extended true
-loops:build_initial true
-loops:relax fastrelax
-cm:min_loop_size 4
-out:nstruct 2
-out:file:silent_struct_type binary
-out:file:silent /path/to/Target.out
-out:path /path/to/output
-relax:fastrelax_repeats 8
-nstruct 1

Yours sincerely

Fri, 2014-10-17 10:31

I found the demo files now. I think they should be:

Thu, 2014-10-23 03:26