You are here

Simple comparative modeling (threading) example is failing with "unknown atom name: CA CB"

4 posts / 0 new
Last post
Simple comparative modeling (threading) example is failing with "unknown atom name: CA CB"
#1

I am trying to run a very simple threading scenario (comparative modeling - but as basic as it gets).

I am executing the command:

partial_thread.static.linuxgccrelease -database $ROSETTA_DATABASE -in:file:fasta /inputs/2yhd_model.fasta -in:file:alignment /inputs/2yhd_model.grishin.txt -in:file:template_pdb /inputs/2yhd.pdb 

But an getting the error: 

ERROR: unknown atom_name: CA  CB

I've been stuck on this for a few days now, and need to ask for pointers. I saw another question on this forum with the same issue, but its left unresolved, so I figured I'd try to give as much detail here as I can.

My template PDB is (/inputs/2yhd.pdb) has been run through clean_pdb.py, with no apparent issues. The PDB is attached.

My grishin formatted alignment was manually created, but I'm pretty sure the formatting is correct since I followed all the documentation. The alignment file is attached.

My target FASTA is also attached in the zip.

Does anyone know why I am getting this error? You'll see in the alignment file, there is only one difference - a single mutation from AALLSSL to AALHSSL.

 

Full output:

core.init: Rosetta version unknown:cbe8723f7038f0b9e5d24fca9c3728b2fc952a37 2016-08-02 10:58:29 -0400 from /scratch/local-benchmark/release/rosetta/git/release/rosetta.binary.linux.release.git
core.init: command: /rosetta/bin/partial_thread.static.linuxgccrelease -database /databases/rosetta/ -in:file:fasta /inputs/2yhd_model.fasta -in:file:alignment /inputs/2yhd_model.grishin.aln -in:file:template_pdb /inputs/2yhd.pdb -ignore_unrecognized_res
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-519025839 seed_offset=0 real_seed=-519025839
core.init.random: RandomGenerator:init: Normal mode, seed=-519025839 RG_type=mt19937
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set.  Created 414 residue types
core.chemical.ResidueTypeSet: Total time to initialize 0.34 seconds.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] skipping pdb residue b/c it's missing too many mainchain atoms: 1920 A TES TES
core.io.pose_from_sfr.PoseFromSFRBuilder: missing:  N  
core.io.pose_from_sfr.PoseFromSFRBuilder: missing:  CA 
core.io.pose_from_sfr.PoseFromSFRBuilder: missing:  C  
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue GLN 23
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue GLN 23
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE1 on residue GLN 23
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE2 on residue GLN 23
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ARG 90
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue ARG 90
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE  on residue ARG 90
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CZ  on residue ARG 90
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH1 on residue ARG 90
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH2 on residue ARG 90
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 155
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 155
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 155
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 155
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 166
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 166
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 166
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 166
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 177
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 177
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 177
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 177
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ASN 178
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OD1 on residue ASN 178
core.conformation.Conformation: [ WARNING ] missing heavyatom:  ND2 on residue ASN 178
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ARG 184
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue ARG 184
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE  on residue ARG 184
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CZ  on residue ARG 184
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH1 on residue ARG 184
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH2 on residue ARG 184
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue GLU 223
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue GLU 223
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE1 on residue GLU 223
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE2 on residue GLU 223
core.pack.pack_missing_sidechains: packing residue number 23 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 90 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 155 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 166 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 177 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 178 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 184 because of missing atom number 6 atom name  CG 
core.pack.pack_missing_sidechains: packing residue number 223 because of missing atom number 6 atom name  CG 
core.pack.task: Packer task: initialize from command line() 
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: talaris2014
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/rama/Rama_smooth_dyn.dat_ss_6.4
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_pp
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/databases/rosetta/rotamer/ExtendedOpt1-5/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.16 seconds to load from binary
core.pack.pack_rotamers: built 155 rotamers at 8 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.interaction_graph.interaction_graph_factory: IG: 5396 bytes
partial_thread: score: 0 identities: 247/248 gaps: 0/248
partial_thread:           2yhd_model       1 PIFLNVLEAIEPGVVCAGHDNNQPDSFAALHSSLNELGERQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLVFNEYRMHKSRMYSQCVRMRHLSQEFGWLQITPQEFLCMKALLLFSIIPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQLTKLLDSVQPIARELHQFTFDLLIKSHMVSVDFPEMMAEIISVQVPKILSGKVKPIYFHT
partial_thread:             2yhd.pdb       1 PIFLNVLEAIEPGVVCAGHDNNQPDSFAALLSSLNELGERQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLVFNEYRMHKSRMYSQCVRMRHLSQEFGWLQITPQEFLCMKALLLFSIIPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQLTKLLDSVQPIARELHQFTFDLLIKSHMVSVDFPEMMAEIISVQVPKILSGKVKPIYFHT
partial_thread: 
partial_thread: id 2yhd.pdb => 2yhd.
core.chemical.ResidueType: atom name : CB not available in residue  CA
core.chemical.ResidueType: 'CA  ' 0x7ecb4a8
core.chemical.ResidueType: ' V1 ' 0x7ecb718
core.chemical.ResidueType: ' V2 ' 0x7ecc498
core.chemical.ResidueType: ' V3 ' 0x7ecc668
core.chemical.ResidueType: ' V4 ' 0x7eb5ba8
core.chemical.ResidueType: 

ERROR: unknown atom_name: CA  CB
ERROR:: Exit from: src/core/chemical/ResidueType.cc line: 3540
[0x3130f71]
[0x492061f]
[0x43c7d57]
[0x18560e4]
[0x184cd74]
[0x412f7c]
[0x4c2e6f4]
[0x953d6d]
caught exception 

[ERROR] EXCN_utility_exit has been thrown from: src/core/chemical/ResidueType.cc line: 3540
ERROR: unknown atom_name: CA  CB

 

Post Situation: 
Fri, 2017-03-24 10:52
brspurri

clean_pdb.py does not overwrite the original PDB. Instead, it writes a new PDB file as the cleaned version.

Right now you're using the pre-cleaned PDB as your template input, which will fail in threading for two reasons 1) It still contains non-protein residues like calcium (hence the issue finding the Cbeta atom on residue CA) and 2) the sequence of the template PDB (pre-cleaning) probably does not match the sequence (post-cleaning) you're using for your alignment.

Try using the cleaned PDB output (something like inputs/2yhd_A.pdb, but the exact name will depend on how you ran the clean_pdb.py script) as the input to the threading run.

Fri, 2017-03-24 11:14
rmoretti

Thanks for your reply! 

Two things:

(1), I had uploaded the uncleaned pdb instead of the cleaned one, yet, the cleaned one was indeed being run in the partial_thread command. HOWEVER, because I am only making a single AA change, I was extracting the FASTA sequence from the template - uncleaned - PDB. This was a no-no. I should have cleaned it as a very first step. Thank you for clearing that up.

(2) Fixing 1 didn't lead to an exact fix. After fixing (1), I was getting "length" errors. Turns out, my target fasta was originally fromatted (with line breaks):

>/inputs/2yhd_model
PIFLNVLEAIEPGVVCAGHDNNQPDSFAALHSSLNELGERQLVHVVKWAKALPGFRNLHV
DDQMAVIQYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLVFNEYRMHKSRMYSQCVRMRHL
SQEFGWLQITPQEFLCMKALLLFSIIPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPT
SCSRRFYQLTKLLDSVQPIARELHQFTFDLLIKSHMVSVDFPEMMAEIISVQVPKILSGK
VKPIYFHTQ

 and it turns out to need to be (without line breaks):

>/inputs/2yhd_model
PIFLNVLEAIEPGVVCAGHDNNQPDSFAALHSSLNELGERQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLVFNEYRMHKSRMYSQCVRMRHLSQEFGWLQITPQEFLCMKALLLFSIIPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQLTKLLDSVQPIARELHQFTFDLLIKSHMVSVDFPEMMAEIISVQVPKILSGKVKPIYFHTQ

 

So, I'm all good now. But this fasta format line break weirdness really threw me for a loop. I hope this post can help somebody else.

Fri, 2017-03-24 11:34
brspurri

Is there a way to mark this as "sovled"? Or do we just let the thread dangle? For anyone reading this: its SOLVED.

Fri, 2017-03-24 11:35
brspurri