You are here

Ensemble Docking in Rosetta3.5?

8 posts / 0 new
Last post
Ensemble Docking in Rosetta3.5?
#1

I apologize if this has been asked before, but I couldn't seem to find any information on it. Is the EnsembleDock protocol available in Rosetta3.5? I don't see a specific executable for it in the rosetta/source/bin directory. Do I just use -l <pdblist> instead of -s <pdbfile> in the normal docking protocol? And can I still use constraints in EnsembleDock?

I specifically want to do antibody-antigen docking, and I realize that SnugDock is available for this purpose (albeit in Rosetta 2.3.1 it seems, which I'll have to install). But I wasn't sure if you could still specify constraints in SnugDock as I have some constraint information that I would like to incorporate into the docking simulation. We potentially would know or have a good idea about what part of the antigen binds to the CDRs, so if that information cannot be incorporated into SnugDock I was thinking of using EnsembleDock instead. Can anybody advise on whether this is a good plan or not? Thank you in advance for your assistance!

Category: 
Post Situation: 
Fri, 2015-05-22 06:42
protos_heis

Do you know how to run the EnsembleDock protocol use Rosetta3.4?
If you know, may I ask you a question?
I don't know how to prepare the energy score in the pdblist file.

Mon, 2015-06-01 19:40
sunlufinal

Sorry for the late response, I thought this thread had died. Thank you for responding! No, but I am trying to figure it out in Rosetta3.5 now and I assume it is the same.

Actually, thank you for asking this question since I forgot that there needs to be a score in the pdblist file. I was just putting a list of PDBs in the pdblist and that was it, and it was crashing. After setting up Rosetta++ I decided not to use SnugDock and only to use EnsembleDock. There do seem to be some helpful files in the SnugDock tutorial that I found that can be applied to EnsembleDock. The demo is in Rosetta3 and lives here: /usr/local/rosetta/demos/public/antibody_docking/PrePack_input. This is specific for antibody-antigen docking but I think it can be generalized to any docking application. If you look at "pdblist1" in that directory it seems that the first 10 entries are the pdb files, followed by 20 numbers. The last 10 numbers look like Rosetta scores. I don't know what the first set of 10 numbers are. One of them is 0.0, so I am guessing that these are RMSD values, but I don't know what the reference PDB is. pdblist2 only has one model, the antigen, and the first number is 0.0, so it probably is an RMSD value. The instructions for this demo claim that this pdblist is modified by the prepacking step (in /usr/local/rosetta/demos/public/antibody_docking/PrePack/prepack.bash) but I have not been successful in getting this script to work.

I don't know if you know this already, but just since the information is useful to post for other users, to use EnsembleDock you have to turn on the -ensemble1 and -ensemble2 flags for the regular docker. Each of these accepts a pdblist as input. You need to have a list for both ensemble1 and ensemble2 even if there is only one model in the list. Then the regular input file (-in:file:s) needs to be the ordinary model that contains all the chains that will be docked. The ensemble1 models I think should only have the receptor chains in it and the ensemble2 models should only have the ligand chain.

I will post back if I get it to work.

Fri, 2015-06-05 05:27
protos_heis

Okay, I think I finally figured out how to run EnsembleDock without any errors. Let me post the outline in the main thread.

Fri, 2015-06-05 08:53
protos_heis

Got it, it turns out I was wrong in the previous post. The first set of numbers are the centroid scores for the PDB files in the list and the second set of scores are the fullatom scores for the PDB files. Example:

 

pdb_0001.pdb

pdb_0002.pdb

pdb_0003.pdb

0

1.2

1.3

-100.4

-100.23

-102.5

 

The first three are the file names, the second three are the centroid scores, and the last three are the fullatom scores.  Here is a simple PyRosetta script the will generate the correct format given a list of PDB filenames:

#!/usr/bin/python
import sys
from rosetta import *

if (len(sys.argv) != 3):
    print "./gen_ensemble_list.py <pdblist> <output>"
    print "    <pdblist>: A list of the PDBs that will be in the docked ensemble"
    print "    <output>: The name of the file that will contain the modified pdblist"
    exit()
init(extra_options="-ignore_unrecognized_res -mute all")
# Create the scoring functions
scorefxn_fa = create_score_function("talaris2013")
scorefxn_cen = create_score_function("cen_std")
# Create a mover to switch poses to centroid mode
sw = SwitchResidueTypeSetMover("centroid")
pdblistfile = sys.argv[1]
pdblist = []
cen_scores = []
fa_scores = []
fin = open(pdblistfile, "r")
for pdbfile in fin:
    # Save the pdbfile name
    pdblist.append(pdbfile.strip())
    pose = pose_from_pdb(pdbfile.strip())
    # Get the fullatom reference score
    fa_scores.append(scorefxn_fa(pose))
    # Switch to centroid mode and get the centroid score
    sw.apply(pose)
    cen_scores.append(scorefxn_cen(pose))
fin.close()
# Output the results
fout = open(sys.argv[2], "w")
for pdbfile in pdblist:
    fout.write(pdbfile + "\n")
for cen_score in cen_scores:
    fout.write(str(cen_score) + "\n")
for fa_score in fa_scores:
    fout.write(str(fa_score) + "\n")
fout.close()

Fri, 2015-06-05 06:03
protos_heis

1.) First, you have to run the docking_prepack_protocol to get prepacked structures that go into EnsembleDock. Here is what my flags file looks like:

-in:path:database /usr/local/rosetta/main/database
-in:file:s dock_model.pdb
-in:detect_disulf false

-out:nstruct 1

-randomize2
-ensemble1 pdblist1
-ensemble2 pdblist2
-no_filters
-score:weights talaris2013_cst
-ignore_unrecognized_res
-partners HL_C
-dock_pert 3 8
-constraints:cst_file constraints.cst

-out:file:fullatom
-overwrite

"dock_model.pdb" is the input PDB structure that contains all three chains, L, H, and C. pdblist1 contains 10 models for the unbound L+H chains only. pdblist2 only has one model of chain C models.

IMPORTANT: Notice the option "-in:detect_disulf false". This tells Rosetta not to search for disulfide bridges. If you have multiple models, some of the models may find disulfide bridges and others won't. This leads to crashing because the residue connections will be different between adjacent models in the ensemble. Save yourself a lot of time trying to debug by using this option.

The output from this step is a bunch of *.ppk files that are really just PDB files.

Sat, 2015-06-06 16:30
protos_heis

2.) Next, you have to modify the pdblist files to include the list of PDB models, followed by the list of each model's centroid only score, followed by a list of the models' fullatom scores. You can use the script in post #5 to generate this list.

3.) Finally, run docking_protocol. I used the same flags file in post #6 but changed nstruct to 1000. You need the "-ensemble1" and "-ensemble2" options AS WELL AS "-in:detect_disulf false" to avoid disulfide crashing.

I hope this information is useful to people!

Sat, 2015-06-06 16:30
protos_heis

One more thing I learned, apparently the input structure (-in:file:s) needs to have the chains in the same order as they appear in the models in the ensemble, otherwise you can have failed docking runs. You'll get an error about how two sequences don't match, and it's due to the fact that the chains were scrambled. If you are getting that error, check to make sure that the ensemble models have the same chain ordering. In my case, the input structure had chain H first, then L, then V. The ensemble had L then H, so editing the input PDB to have L before H seemed to fix the problem.

Mon, 2015-06-08 07:00
protos_heis