Dear all,
I'm a new user of Rosetta. The CryoEM protocol provided by Dimaiolab is excellent.
My protein contains a peptidyl carrier protein domain. The peptidyl carrier protein domain has a prosthetic group Phosphopantetheine which is conjugated at a serine residue. More complicated is that the phosphopantetheine arm carries a cysteine peptide group via the terminal thioester linkage. I don't know how to model this peptidyl carrying phosphopantetheine arm into the CryoEM map and then perform the iterative or simple refinement against the map.
I'm wondering:
Whether I should draw a peptidyl-phosphopantetheine structure first? And then convert it to a pdb file? Can I draw the chemical using pymol, so that it can be saved directly? I know I should then again convert it to mol2 file using OpenBabel. After that molfile_to_params.py can be employed to obtain two params files.
Since the phosphopantetheine is conjugated to the serine residue, how should I tell rosetta that it is convalently bound?
Should I also need to link the peptidyl-phosphopantetheine to the serine residue in the peptidyl carrier protein domain model first using Coot before the Rosetta simple relax refinement against the CryoEM map?
And are there other materials I should prepare?
Thanks so much for your help.
Regards,
Zhijun
(Phosphopantetheine was discussed a few months ago in https://www.rosettacommons.org/node/10906 but this is a different question, not a duplicate)
> Whether I should draw a peptidyl-phosphopantetheine structure first? And then convert it to a pdb file? Can I draw the chemical using pymol, so that it can be saved directly? I know I should then again convert it to mol2 file using OpenBabel. After that molfile_to_params.py can be employed to obtain two params files.
The big problem with the PDB format is bond order is poorly encoded (repeated CONECT lines for double bonds only). And formal charge is not really a thing. So if you draw something in ChemDraw or similar don't go through the PDB format, export as sdf or copy a SMILES (a string of text). Actually if you go to Wikipedia or PubChem etc. you can get a SMILES string from the infobox: https://en.wikipedia.org/wiki/Phosphopantetheine. --> `O=C(NCCS)CCNC(=O)[C@H](O)C(C)(C)COP(=O)(O)O`. But `OP(=O)(O)O` is wrong charge-wise and you want two connections. `O=C(NCCS[1:*])CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])O[2:*]`, where R1 (okay, that's isotope 1, but shh everyone does it) is your peptide thioester connection and R2 is to your serine. Now, to make things worse, you really ought to obey the atomnames from the PDB database for residue `PNS` (https://www.rcsb.org/ligand/PNS).
I made a rdkit to params module for python3 that is more flexible than mol_to_params.py... so I'll skip you the toil:
This works with the default PNS residue and has two connects. CONN1 to S (peptide C-terminus), CONN2 to serine.
I gave it a go and it works if there is no LINK for the S is absent.
I tried it with a the sequence RESETTA (fab in pymol and pair_fit) and the lines:
I don't know off the solution to this issue and the flag `-Ctermini false` and changing the CONNECT to LOWER_CONNECT did not work. But someone must have encountered and asked after a C-terminal conjugation in the forums —pretty sure that is how ubiquitin works.
> Since the phosphopantetheine is conjugated to the serine residue, how should I tell rosetta that it is convalently bound?
Two requirements.
Thank you so much Matteo Ferla. Your web app "https://direvo.mutanalyst.com/params" is also a great place to make params files.
The information you provied helped me understand smile file and how to make a param file for a new ligand.
I'm trying to use the params file in Rosetta relax. Will let you know the results.
Zhijun
Hey Matteo Ferla, I still have problems to proceed.
The Rosetta software seems to be able to process the phosphopantetheine ligand now. So if I provide the “PNS.cen.params” and “PNS.fa.params” or the “Phosphopantetheine.params” in the command. The program can not run. If I run the program without the params file, the program can run but will generate a pdb disconnect the covalent bond between the ser residue and the phosphopantetheine phosphate.
Searching the rosetta forum, I seems that I should use the enzdes constraint. But I don’t understand how to write for my protein and how to use it in the CryoEM protocol.
I attached a homologuous pdb file which can also refine into my CyroEM file.
Below is the command:
#!/bin/bash
relax.static.macosclangrelease \
-database /Users/zhijun/bin/Rosetta/rosetta_bin_mac_2020.08.61146_bundle/main/database/ \
-in::file::s 6mfyA.pdb \
-parser::protocol B_relax_density.xml \
-edensity::mapreso 3.0 \
-edensity::cryoem_scatterers \
-crystal_refine \
-beta \
-out::suffix _params \
-default_max_cycles 200 \
The xml:
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="dens" weights="beta_cart">
<Reweight scoretype="elec_dens_fast" weight="35.0"/>
<Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76,C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88,A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/>
</ScoreFunction>
</SCOREFXNS>
<MOVERS>
<SetupForDensityScoring name="setupdens"/>
<LoadDensityMap name="loaddens" mapfile="refine141run_class001_Epi_B.mrc"/>
<FastRelax name="relaxcart" scorefxn="dens" repeats="2" cartesian="1" />
</MOVERS>
<PROTOCOLS>
<Add mover="setupdens"/>
<Add mover="loaddens"/>
<Add mover="relaxcart"/>
</PROTOCOLS>
<OUTPUT scorefxn="dens"/>
</ROSETTASCRIPTS>
Hey Matteo Ferla,
I used the params file you generated, the program can run. But if I use the params file generated from the web app, the program did not work.
Still in the generated pdb file, although in the original pdb the phosphopantetheine is covalently conjugated to the serine residue, it is now detached.
If I should use a Enzdes cst file, I can not figure out how to specify the relationship of the serine residue and the phosphopantethiene in the template below.
CST::BEGIN
TEMPLATE:: ATOM_MAP: 1 atom_name: O2 C6 O4
TEMPLATE:: ATOM_MAP: 1 residue3: D2N
TEMPLATE:: ATOM_MAP: 2 atom_type: Nhis,
TEMPLATE:: ATOM_MAP: 2 residue1: H
CONSTRAINT:: distanceAB: 3.10 0.20 100.00 0
CONSTRAINT:: angle_A: 120.00 5.00 30.00 360.00
CONSTRAINT:: angle_B: 125.90 10.00 20.00 360.00
CONSTRAINT:: torsion_A: -5.00 15.00 0.00 360.00
CONSTRAINT:: torsion_B: -155.0 15.00 25.00 360.00
CONSTRAINT:: torsion_AB: 0.00 0.00 0.00 180.00
CST::END
Sorry, I was pressed for time so cut my reply short and could not find the solution.
I tried it in pyrosetta with a short sequence and it segfaulted. Basically my guess at the problem were two options.
A common issue (say in loop closure) is the terminus (OXT) gets added, but this does not appear to be the case as using `-use_truncated_termini true` fails (I tried the wrong command before but the correct one fails too).
The second is that polymers have a different type of connection in params files, for amino acids these are:
While what I was hoping would work was that a CONNECT will work with a UPPER_CONNECT. But it segfaults at loading the pose.
The terminal caps ACE and NME are not handled like patches, so I had a look at them (terminal folder in db) just now and they turned out to be interesting. They are regular amino acids. This means that the residue needs to follow the peptide (same chain, resi+1 and no TER line in between). If this is not respected you'll get a warning saying "ERROR <name3> <name3> UPPER TERMINUS" or something. Modding the topology file by cannibalising NME:
This seems to work, except for a ideal coordinate error. This happens when you have a residue and the first three internal coordinate lines aren't the backbone. So... The params file needs to be done again with the "backbone" at the front to make the ICOOR block koscher. `*SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])O*`.
In pyrosetta this seems to work fine in for a PDB with two LINK records and the PNS residue after the C-terminal end of the peptide, with resi+1, same chain and no TER record beforehand.
Forgot to mention, I used the `-use_truncated_termini true` and `-use_terminal_residues true` flags.
About the other points you raised:
> I used the params file you generated, the program can run. But if I use the params file generated from the web app, the program did not work.
Yes, I hadn't realised yet another corner case and the web app was several versions behind the github repo —the server does not update on a Github webhook trigger as it's piggybacking of another app out of laziness and indecision as I am not sure where to host it —I am still waiting on my university's legal team (approx. 7 months) to sign a memorandum of understanding to let me use pyrosetta for a different university web app...
> If I run the program without the params file, the program can run but will generate a pdb disconnect the covalent bond between the ser residue and the phosphopantetheine phosphate.
That is because it is using PDB component library. This autogenerated set is great, but imperfect. `-ignore_unrecognized_res true -load_PDB_components false` flags without a params file will give you no PNS.
> enzdes constraint
The aim of using constraint of the regular type is to force the two atoms to be close, even if they have a horrid LJ potential. It is not an ideal strategy. In Gromacs a covalent bond is made by making the sulfur or oxygen atom virtual and adding a constraint to prevent the system from blowing up. That works better, but you'd need to make an alanine with a virtual atom instead of one of the beta hydrogens. The enzdes constraints has a value that disables the LJ (two terms in rosetta) clash between the two atoms —this is the zero or one at the end of distanceAB line— but it still is odd and results in some funky scores. Also it is a pain to set up as it requires a resfile too to specify which are the residues.
Nevertheless, the residue crosslinks discussed previously do not preseve torsion angles, only distance. Hence why you also often need a constraint of the regular type (or enzdes type) to keep the two atoms close in a chemical acceptable way. My GitHub repo for params has this, but I did not make a webapp page for it.
It's a great relief that I need not to focus on the study of Enzdes.
I tried the params file you updated. I removed the TER between the C terminus and PNS in the pdb file. It did not work. The error message is "ERROR: unable to find desired variant residue: PNS PNS LOWER_TERMINUS_VARIANT"
The commands:
relax.static.macosclangrelease \
-database /Users/zhijun/bin/Rosetta/rosetta_bin_mac_2020.08.61146_bundle/main/database/ \
-in::file::s 1-141-PNS.pdb \
-extra_res_fa Phosphopantetheine.params \
-use_truncated_termini true \
-use_terminal_residues true \
-parser::protocol B_relax_density.xml \
-edensity::mapreso 3.0 \
-edensity::cryoem_scatterers \
-crystal_refine \
-beta \
-out::suffix _params \
-default_max_cycles 200 \
Yes, that is the fiddly error I was warning about. So it is something to do withe PNS residue not looking like a proper polymer residue...
Looking at the file, the entry type is `HETATM`, whereas it should be a `ATOM`. Sorry, my bad.
I edited the pdb file and replace the "HETAM" to "ATOM". The error message is "ERROR: unable to find desired variant residue: PNS PNS LOWER_TERMINUS_VARIANT".
The pdb file is attached.
Okay. I gave your file a spin. There are three remaining issues.
LINK
It lacks LINK records:
PyMOL is one of those programs that does proximity bonding in addition to obeying CONECT records, so a PDB may look bonded nicely in PyMOL, but it may be wrong in reality.
Atoms
The second issue is that the serine atom has a OG atom, while PNS lacks O26.
So two options:
From the chemical point of view the PDB is correct as the OG atom attacked the P24 atom, while the topology file, while correct, is conceptually wrong. So attached is yet another params. For a case where there is no O26 as the Ser66 OG connects to the phosphate.
Distance
To clarify PNS binds to the C-terminal of a peptide via a thioester bond and to a serine sidechain. Ehr. In the case you linked, it is binding to the same protein twice?There is a 20 Å bond. Is the Glycine 141:B really connected to the PNS via a thioester bond??