You are here

antibody.py

9 posts / 0 new
Last post
antibody.py
#1

Dear all,

I am using the antibody.py script that comes with PyRosetta (rosetta_src_2016.13.58602_bundle, v3.6) and associated software to generate structural models for VH/VL sequences. For most of the sequences that I am using as input, the script terminates at the antibody_graft step with an error (pasted below):

Error: ERROR: Exception caught by JobDistributor while trying to get pose from job 'FR0_0001'
Error: Cannot normalize xyzVector of length() zero
Error: Treating failure as bad input; canceling similar jobs protocols.jd2.FileSystemJobDistributor: job failed, reporting bad input; other jobs of same input will be canceled: FR0_0001 protocols.jd2.JobDistributor: no more batches to process... protocols.jd2.JobDistributor: 1 jobs considered, 1 jobs attempted in 2 seconds caught exception 1 jobs failed; check output for error messages

When I examine the input file (generated in previous steps and called FR0.pdb), the only thing that is in common for all input sequences that fail here is that there are main-chain atoms inserted that have been given numbers and coordinates like in the example excerpt:

ATOM 0 CA THR L 30A 189.954-268.030-229.397 1.00 25.00
ATOM 0 CA SER L 30B 191.164-270.358-231.659 1.00 25.00
ATOM 0 CA ALA L 30C 192.374-272.686-233.921 1.00 25.00
ATOM 0 CA TYR L 30D 193.584-275.014-236.183 1.00 25.00
ATOM 0 CA PHE H 100B 544.800-310.561 442.751 1.00 25.00

These atoms lie on straight lines far away from the protein chain and represent "inserted" residues into the CDR loops.

When I remove those lines from the FR0.pdb, save the file as FR0_test.pdb, and run it through the procedure that fails above (antibody_graft), the procedure runs normally with a successful exit state.

Despite the normal run, the fact is that the removed amino acids do belong in the CDR loops and removing them is wrong.

I would appreciate any help!

Best regards, thanks in advance,

-- Fran

Post Situation: 
Mon, 2016-08-01 08:12
franfdez

Hi Fran,

Can you post full command-lines and options used? 

Also, Is it possible to try this using the Rosie server?  http://rosie.rosettacommons.org/antibody

Lastly, I would try a newer weekly version as well.  There has been some antibody-related fixes in the past few months.  They don't specifically address your problem, but they might help.

-Jared

Mon, 2016-08-01 09:22
jadolfbr

Hi Jared,

I am sorry using the ROSIE server is not possible at this stage.

The full command is given below and run without any additional flags:

# stage_1_2.sh

source /Model/ROSETTA/rosetta_latest/main/source/build/PyRosetta/linux/namespace/release/SetPyRosettaEnvironment.sh
/Model/ROSETTA/rosetta_latest/tools/antibody/antibody.py \
    antibody:numbering_scheme IMGT_Scheme \
    antibody:light_chain kappa \
    antibody:check_cdr_chainbreaks true \
    antibody:check_cdr_pep_bond_geom true \
    --light-chain input_l.fasta --heavy-chain input_h.fasta \
    --superimpose-profit /Model/ProFit/ProFitV3.1/profit \
    --blast /Model/BLAST/ncbi-blast-2.4.0+/bin/blastp \
    --blast-database /Model/ROSETTA/rosetta_latest/tools/antibody/blast_database \
    --antibody-database /Model/ROSETTA/rosetta_latest/tools/antibody/antibody_database \
    --rosetta-bin /Model/ROSETTA/rosetta_latest/main/source/bin \
    --rosetta-database /Model/ROSETTA/rosetta_latest/main/database \
    --rosetta-platform linuxgccrelease \
    --verbose \
    | tee antibody.log 2>&1

Thanks also for the pointer to the changes that have occurred in the more recent versions of Rosetta, I am going to install the latest version right now.

Best,

-- Fran

Mon, 2016-08-01 10:06
franfdez

Ok.  This should do it.  These options should both be false, as your input PDB is going to be wonky and have chainbreaks.  Does the antibody script pass these as true (I have never had any success using it), or did you add them?  I can change the documentation, whereever it is. 

-antibody:check_cdr_pep_bond_geom 

-antibody:check_cdr_chainbreaks

Mon, 2016-08-01 10:51
jadolfbr

Hello jadolfbr,

Thank you for the pointer, this is getting interesting.

The two options mentioned (-antibody:check_cdr_pep_bond_geom and/or -antibody:check_cdr_chainbreaks) were taken from the current documentation (either on the web page or the program's message) and the only instance they were passed was using VH/VL sequences straight from a germline gene sequence, not for any of our rearranged antibody sequences.

Following jared's comments before, I installed the newest available binary distribution of Rosetta (2016.20.58704), built PyRosetta, and, based on your post, I have now re-run the script above (stage_1_2.sh) setting each one of the two options in your post to either true/true (control case), true/false, false/true, or false/false. 

In all cases the antibody.py script fails at the same point, that's when the antibody_graft program is called.

However, now, in contrast to what I was seeing before, I can move to grafting/details/ and re-run the antibody_graft part as a standalone script (below) and it really runs through the previous error and up to nearly the end.

# This script is what antibody.py says it has tried to run! (except for --verbose)
antibody_graft.linuxgccrelease -database /Model/ROSETTA/rosetta_latest/main/database \
    -overwrite  -s FR0.pdb  -antibody::h3_no_stem_graft \
    -scorefile score-graft.sf -check_cdr_chainbreaks false --verbose

In my particular case this version of the script fails with the informative error message below:

protocols.antibody.cluster.CDRClusterMatcher: *** No known cluster of CDR length 9 omega of TTTTTTTTC found. ***
protocols.antibody.cluster.CDRClusterMatcher: *** Consider using the command-line option -allow_omega_mismatches_for_north_clusters to find the closest cluster! ***

And when I follow the instructions, the script finally finishes successfully.

# This option makes the script finish successfully
-allow_omega_mismatches_for_north_clusters

Sadly, the results.json file is not written out nor are the final grafted and grafted-relaxed files copied over to their final destination, but those are minor points given that I can do that myself and run stage_3.sh from there.

My feeling is that the problem as such is not yet solved but that at least we are coming to a point where I can push the calculations forward anyway. Somehow the antibody.py script bails out too soon and the error that produces only points to the culprit without really clarifying what's going on.

Thanks again, best,

-- Fran

Tue, 2016-08-02 01:24
franfdez

Hey Fran,

Thank you for your awesome response and thorough tests.  The good and bad news (for me) is that this is my code.  So we can fix this for the next release.  

First, I'm glad adding the option proceeded to make antibody graft work.  I'll update the documentation accordingly so people won't add it to the antibody script.  It is useful when doing individual antibody work (SnugDock, Model_H3, etc.), using the antibody FeaturesReporter for bioinformatic analysis, and when doing antibody design.    Actually doing the CDR grafting part through antibody.py is currently a bit strange in terms of the values linear coordinates - as you have seen.  The Gray lab is  currently working on having the full antibody.py basically part of Rosetta proper - and by doing this - this error will go away.

The second error is strange, because that code doesn't actually result in an error.  That message is just a warning, but Rosetta does not crash.  I'm not sure if this is some interpretation of the output that antibody.py is doing or the error is somewhere after the cluster is set to 'NA' internally (which is what it does if there are no cis/trans matches).  So, although that message is output, this isn't what is causing the failure.  It is hard for me to debug this without input, but I understand the sensative nature of these things.  So, would you be able to put the full log (or at least the end) on here and I can see what can be done?

 

-Jared 

Tue, 2016-08-02 14:07
jadolfbr

Hi Jared,

Thank you for dealing with this! It's a great piece of code, BTW, and extremely useful for us.

I should say that looking into the antibody_graft output files I discovered that there may be some additional steps in antibody.py (besides the results.json and the copying over of files) after antibody_graft. Specifically, there must be, at least, (1) a relax/re-scoring program (since ./grafting/score-relax.sf is clearly different from ./grafting/details/score-graft.sf; actually, I should have noticed from the filenames already); and (2) a C-ter constraints step. How can I re-create those in the same way I did for the antibody_graft stage?

As for the second error, I can post the end of the logfile (leaving out the -allow_omega_mismatches_for_north_clusters option so that it does not produce a "normal" termination, even though you are absolutely right that rosetta didn't really crash without it) (also apologies for the selective deletion of sensitive parts).

While copy-pasting it here I also noticed that there was a previous warning about a "residue not found in pose", which I don't know if it's relevant.

First, the script, then the end part of the logfile:

# test.sh
antibody_graft.linuxgccrelease -database database -overwrite -s FR0.pdb \
    -antibody::h3_no_stem_graft -scorefile score-graft.sf \
    -check_cdr_chainbreaks false --verbose | tee antibody_graft_test.log 2>&1
# logfile (last lines)
core.pack.pack_rotamers: built 2110 rotamers at 142 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.interaction_graph.interaction_graph_factory: IG: 1473832 bytes
basic.io.database: Database file opened: sampling/antibodies/cluster_center_dihedrals.txt
protocols.antibody.AntibodyNumberingParser: Antibody numbering scheme definitions read successfully
protocols.antibody.AntibodyNumberingParser: Antibody CDR definition read successfully
antibody.AntibodyInfo: Successfully finished the CDR definition
antibody.AntibodyInfo: (Landmark) residue not found in pose: 105 L  
(Landmark) residue not found in pose: 105 L  
AC Detecting Regular CDR H3 Stem Type
antibody.AntibodyInfo: <deleted>
antibody.AntibodyInfo: AC Finished Detecting Regular CDR H3 Stem Type: KINKED
antibody.AntibodyInfo: AC Finished Detecting Regular CDR H3 Stem Type: Kink: 1 Extended: 0
antibody.AntibodyInfo: Setting up CDR Cluster for H1
protocols.antibody.cluster.CDRClusterMatcher: Length: 13 Omega: TTTTTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for H2
protocols.antibody.cluster.CDRClusterMatcher: Length: 10 Omega: TTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for H3
protocols.antibody.cluster.CDRClusterMatcher: Length: 12 Omega: TTTTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for L1
protocols.antibody.cluster.CDRClusterMatcher: Length: 15 Omega: TTTTTTTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for L2
protocols.antibody.cluster.CDRClusterMatcher: Length: 8 Omega: TTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for L3
protocols.antibody.cluster.CDRClusterMatcher: Length: 9 Omega: TTTTTTTTC
protocols.antibody.cluster.CDRClusterMatcher: 
protocols.antibody.cluster.CDRClusterMatcher: *** No known cluster of CDR length 9 omega of TTTTTTTTC found. ***
protocols.antibody.cluster.CDRClusterMatcher: *** Consider using the command-line option -allow_omega_mismatches_for_north_clusters to find the closest cluster! ***
protocols.antibody.cluster.CDRClusterMatcher: 
protocols.jd2.JobDistributor: FR0_0001 reported success in 55 seconds
protocols.jd2.JobDistributor: no more batches to process... 
protocols.jd2.JobDistributor: 1 jobs considered, 1 jobs attempted in 55 seconds
protocols.jd2.JobDistributor: WARNING: The following options have been set, but have not yet been used:
        -rescore:verbose

Many thanks again!

-- Fran

Wed, 2016-08-03 02:12
franfdez

Thanks Fran!  The output here doesn't really make much sense as to what is actually failing.  Rosetta is giving a success.  How does the structure look?  If you use a non-sensative structure, is there still an issue?  Perhaps antibody.py is doing some check after that makes it not output the files and move onto the next step.

 

FYI I'm preparing for RosettaCon, which starts on Saturday, so I'll be able to look more into this when I get back next week, just in case you don't hear from me.

-Jared

Wed, 2016-08-03 07:58
jadolfbr

Hi Jadolfbr,

As far as I can tell, the structure looks pretty good -- I am looking at the relaxed, grafted, stem optimized model.0.pdb structure.  The total score is about -250 to -305; is that a reasonable start point?

How many decoys are recommended to generate here for a production run?  I don't mean the about 2000 structures produced during stage 3 (antibody_H3), but by the end of stages 1 + 2 (grafted, stem minimized, relaxed).

Thanks!

-- Fran

Thu, 2016-08-04 01:57
franfdez