The scripts and input files that accompany this demo can be found in the
demos/public directory of the Rosetta weekly releases.
KEYWORDS: UTILITIES GENERAL
For the most part Rosetta does a pretty good job at reading structures, however with unusual chemical modifications, sometimes Rosetta runs into problems. Here are some of the problems with pdb files that we have encountered that currently cause problems with Rosetta:
If you run into problems with processing a structure, it would be a good idea to go in and look the text of the pdb file.
If there is an unrecognized residue,
Go look into the structure and see what it is and what's going on. Some structures in the PDB may be C-alpha only like, 2rdo.pdb or 3htc.pdb.
If it is a HET group that you do not need, such as a non-covalent small molecule, like a NAD, HEME group or glycerol that is included with the structure, then you can include -ignore_unrecognized_res option to the command line.
If there is a covalently linked ligand or modified amino acid or base, either there is a parameter file in the database/chemical/residue_type_sets and it is looked up correctly. If it is important, (for instance if it is part of the covalent chain like the CSP residue in 1qu9.pdb) but not in the database, it is possible to convert a MDL, MOLfile, or SDF file to a Rosetta residue topology file.
Please see documenation on preparing ligands for using molfile_to_params correctly.
If there are unrecognized residues but you don't actually care about them,
Manually remove the HETATM atom records and checking in PyMOL that the structure is still reasonable (no broken chains etc.)
Use a script to clean the PDB appropriately. For example tools/protein_util/scripts/clean_pdb.py can handle some basic cleanup proceedures. This script is a little hacky so use with caution.
-ignore_unrecognized_res or just
ignoring unrecognized residues can coding assumptions that are not
rigerously checked (i.e. expose bugs in Rosetta). So be prepared
Rosetta to fail in unusual ways:
1. if the first residue of a chain is ignored (e.g. with 1mhm.pdb) then Rosetta may fail with this error:2. if the residue type three letter code mis identified, such as SUC (sucrose in 3KB8.pdb) for the SUCK residue type, used in a score term to reduce buried cavities.
ERROR: ( anchor_rsd.is_polymer() && !anchor_rsd.is_upper_terminus() ) && ( new_rsd.is_polymer() && !new_rsd.is_lower_terminus() ) ERROR:: Exit from: src/core/conformation/Conformation.cc line: 428
core.io.pdb.file_data: [ WARNING ] discarding 23 atoms at position 207 in file /home/momeara/scr/data/pdb/kb/pdb3kb8.ent.gz. Best match rsd_type: SUCK core.io.pdb.file_data: [ WARNING ] discarding 23 atoms at position 750 in file /home/momeara/scr/data/pdb/kb/pdb3kb8.ent.gz. Best match rsd_type: SUCK core.io.pdb.file_data: [ WARNING ] discarding 23 atoms at position 980 in file /home/momeara/scr/data/pdb/kb/pdb3kb8.ent.gz. Best match rsd_type: SUCK core.conformation.Conformation: [ WARNING ] missing heavyatom: OXT on residue SER_p:CtermProteinFull 201 core.conformation.Conformation: [ WARNING ] missing heavyatom: ORIG on residue SUCK 202 ... ERROR: too many tries in fill_missing_atoms! ERROR:: Exit from: src/core/conformation/Conformation.cc line: 2590
If you care about the coordinates of hydrogens in the PDB file, you should be careful that they are named correctly. Rosetta does not use the most uptodate hydrogen atom naming convention. If the hydrogen atom names are not recognized, then they will be stripped off and replaced automatically by the -optH protocol when the pdb file is processed.
To make sure H-atoms are named correctly you can use the tools/convert_hatom_names.py script.
python tools/convert_hatm_names.py --data_dir input_files --output_dir prepared_input_files