You are here

ddg_monomer mutations list: How to specify chain ID?

14 posts / 0 new
Last post
ddg_monomer mutations list: How to specify chain ID?
#1

I want to use ddg_monomer with multiple simultaneous mutations. According to the documentation (https://www.rosettacommons.org/docs/latest/ddg-monomer.html) this requires a mutation file, instead of the resfile used by other applications.

However, in the example provided by the documentation, there is no use of a chain ID. My structure has multiple chains.

How can I specify a chain ID in the mutation file?

If this is not possible, how do I know what residue number I have to put in the mutation file, when all I know is the chain ID and the residue number inside the chain?

Category: 
Post Situation: 
Tue, 2014-12-09 14:26
cossio

I believe that application recommends that the PDB file be renumbered, from residue 1 to n, and that it is this very issue with how it parses the mutation file that motivates this requirement. It's not prohibitively difficult to write a Python script that renumbers PDB files in this way; it is similarly possible to use the opposite mapping to re-process the output.

Looking at the documentation:

All PDBs should be renumbered so that their first residue is residue 1 and number consecutively so that, if there are missing residues in the structure (due maybe to missing density) that these residues are simply skipped in the residue numbering. The numbering of all residues in both the distance-restraint file and the mutation-list file should follow this numbering.

Tue, 2014-12-09 16:51
everyday847

Yes, the issue is the distinction between "PDB numbering" and "pose numbering". In PDB numbering the numbers can start at an arbitrary point and jump around, and there's a chain letter associated with it. With pose numbering, the first residue is residue 1, and the residue numbers increment by one each time, regardless of chain. So if you have 123 residues in the first chain, the first residue of the second chain will be 124.

Generally, if a Rosetta input or output doesn't specify a chain letter, it's expecting things to be in pose numbering. The difficulty of converting back and forth between the two numbering systems is why many protocols suggest renumbering the PDB such that pose numbering and PDB numbering match - often that's not a hard requirement, so long as you can keep track of the conversion yourself.

Rosetta can renumber the PDBs on output for you if you provide the -out:file:renumber_pdb option. Alternatively, there's a number of scripts in the tools/ directory which can renumber PDBs for you. I've used tools/protein_tools/scripts/pdb_renumber.py successfully, but it may require you to add the tools/protein_tools/ directory to your PYTHONPATH environment variable.

Tue, 2014-12-16 12:16
rmoretti

Hi R Moretti,
Can I ask

1. So do you mean we cannot use a resfile in ddg_monomer?

I know only "-ddg::mut_file <mutfile>" is used as an example in https://www.rosettacommons.org/docs/latest/ddg-monomer.html.

However, on that page, both resfile and mutfile (i.e. Input File - b-1) have been introduced. So why is that?

2. If we really can only use mutfile, what is the syntax if we want different mutations (NOT multiple mutations in one structure)?

I have tried

total 1
1
D 1 A
total 1
1
D 1 V
total 1
1
D 1 L
total 1
1
D 1 I
total 1
1
D 1 G

However, the run was not successful and I was told

"apps.public.ddg.ddg_monomer: end reading mutations for this" for the majority of ddg.log (491 MB), and

"/cm/local/apps/sge/current/default/spool/node-001/job_scripts/4679102: line 20: 26401 Bus error ddg_monomer.linuxgccrelease @ /home/ucbechz/Scratch/20141215_ddg_renumber_multi_test/input/options_3 > ddg.log" in the error file.

So how to deal with different mutations in one go?

Thank you very much.

Yours sincerely
Cheng

Wed, 2014-12-17 13:52
lanselibai

You can use a resfile (and thus PDB numbering) with ddg_monomer, but if you do you're limited to specifying a set of single point mutations. If you want to have a single structure with multiple mutations, then you need to use the ddg_monomer-specific mutation file format, which means that you need to use pose numbering.

For the format of the mutations file, the "total" line only comes once, at the very top of the file.

Fri, 2015-01-02 10:32
rmoretti

Hi R Moretti,
Thank you very much.

For mutfile, do you mean I should use the following if I have multiple single point mutations in one structure?

total 1
1
D 1 A
1
D 1 V
1
D 1 L
1
D 1 I
1
D 1 G

Thank you.

Yours sincerely
Cheng

Sat, 2015-01-03 08:22
lanselibai

Cheng,

 

The "Total #" at the top of the mutfile indicates the total number of mutations to be made; so for your use here it would read:

total 5
1
D 1 A
1
D 1 V
1
D 1 L
1
D 1 I
1
D 1 G

The "1" in each line indicates the number of mutations to be made that round, so if you wanted to make two mutations simultaneously it would read

 

Total 6

2

D 1 A

G 3 A

2

D 1 V

G 3 V

2

D 1 L

G 3 L

 

Thanks,

M. Benhaim

Tue, 2015-09-15 13:29
mbenhaim

From the ddg_monomer documentation page (https://www.rosettacommons.org/docs/latest/ddg-monomer.html):

All PDBs should be renumbered so that their first residue is residue 1 and number consecutively so that, if there are missing residues in the structure (due maybe to missing density) that these residues are simply skipped in the residue numbering.

Forget about multiple chains for a moment. Skipping missing residues means that I should renumber as 1, 2, 4, 5 (where residue 3 is missing), or
as 1, 2, 3, 4 ..., where 3 now refers to the original residue 4?

Fri, 2014-12-26 07:50
cossio

It's the latter. The numbers should match the residues which are present. If the residue "3" is missing, then the number 3 should go to the third residue which *is* there, the original residue "4".

Fri, 2015-01-02 10:55
rmoretti

Hi cossio, everyday847 and everyone,
Can I ask can you successfully run the ddg_monomer? I tried to test it by only mutating the first residue into alanine. I already re-number the residues from 1 to the end. However, I got a log file (attached) with error message of

Number of residue types is greater than MAX_RESIDUE_TYPES.
(see attachment for details)

In addition, it takes more than 10 hours to run. So I need to wait a long time before actually knowing it is going wrong.

The other output files are "wt_.out" with 5738 KB and "wt_traj" with nothing in it.

The input options file is also attached.

I would appreciate it very much if someone can help me identify the problem. Please tell me if you need anything else.

Thank you very much.

Yours sincerely
Cheng

Sat, 2014-12-13 04:31
lanselibai

You might want to start your own thread...

Mon, 2014-12-15 06:56
cossio

I will try "-override_rsd_type_limit" as rmoretti said in https://www.rosettacommons.org/node/3828
Sorry about that.

Wed, 2014-12-17 13:33
lanselibai

Hi,

I need to run ddG_monomer protocol on a multimeric protein where some chains are >20 Angstroms apart. From what I understand from the above discussion I need to renumber the PDB file so that all numbering is continuous. But, there is a minimization step involved in ddG_monomer protocol.

1) If I use a structure where say residue 100 and 101 are 20 Angstroms apart, won't that affect the minimization process and generate highly unstable energy values?

2) Is there a way of combining rosetta fold and dock (or some other protocol) with ddG_monomer to run calculations on multimeric proteins?

Thanks in advance.

Wed, 2016-04-20 03:12
shrutikhare

In the standard (non-Cartesian, non-Dualspace) minimization proceedure - which should be what the ddg_monomer protocol is using - you're minimizing over torisional angle space in the abscence of any energies which penalize bond lengths and angles. (Unlike molecular mechanics, the Rosetta energy function doesn't have bond length/angle penalties - correct geometry is maintained by the sampling method, rather than the scoring function.) So even if Rosetta thinks residues 100 and 101 are chemically bonded, they're not going to be drawn together or even cause an energy penalty.

However, even if numbering is continous, that doesn't mean that Rosetta thinks the residues are bonded. If you have different chain designations (e.g. 100 A and 101 B) or if you have a TER card between the chains in the input PDB, then Rosetta should make them independent chains. 

You can certainly combine fold and dock (ab initio folding of symmetric proteins) and other such structure prediction programs with ddg_monomer. However, the way you would do it would be to do a full fold and dock (or RosettaCM, or protein-protein docking, etc) run to get output structures. Then you'd take one or more of those output structures and use them as inputs to the ddg_monomer protocol. 

Despite the name, "ddg_monmer" should work (that is, run and produce output) even on multimer structures. The "monomer" in the name is mostly refering to the case that its use case is for calculating the ddG of folding, rather than something like the ddG of binding. (The caveat here is that the benchmark was done on monomeric proteins, so the results for residues in the interface of a protein dimer may or may not perform as well as for residues in the core of a monomeric protein.)

Thu, 2016-04-28 16:43
rmoretti