- Author:
- Steven Lewis smlewi@gmail.com
This document was orginally written 6 Apr 2010 by Steven Lewis
Rosetta's PDB reader is fragile, and the PDB is not particularly carefully curated. There are lots of PDBs that cannot be read in by Rosetta (several thousand fail to read in at all), and many more that cannot be scored for some reason. Unfortunately, these bad PDBs tend to cause Rosetta to crash rather than exit gracefully. This document describes how to "robustify" Rosetta so that it will not crash when encountering a bad PDB.
You might want to robustify Rosetta if you are going to do a small-nstruct, large -l experiment. In my case, I ran with -nstruct 1 against literally the entire PDB.
The changes that have to be made cause a significant performance hit to Rosetta (particularly the vectorL change). These should not be left on by default. A better question is, why does no one improve the PDB reader to more gracefully catch these errors. The answer to that is: why don't you do it, and make this page obsolete?
You NEED to make a few changes:
- run your code under jd2
- replace all assert() statements in utility/vectorL with runtime_assert statements. This causes out-of-bounds errors in the Rosetta workhorse vector1 class to be caught in the assert instead of segfaulting. You could also compile in debug mode, or leave the NDEBUG statements in release mode. The point is that bounds checking must be on. (This is what causes the performance hit)
- replace all assert()s in the Conformation class with runtime_assert, or equivalent.
- ensure that the preprocessor statement EXIT_THROWS_EXCEPTION is defined. You can do this on command line, or modify user.settings. For example, modify user.settings to include: "appends" : { "defines" : ["EXIT_THROWS_EXCEPTION"], },
This combination of changes will cause vector overruns to throw exceptions inside runtime_assert, instead of crashing or causing segfaults. jd2 will catch the exceptions and treat the failed PDB as a failed job, print an error message, and cleanly move on to the next PDB in your list.
There are some other suggested changes:
- pass the -infile::obey_ENDMDL flag. This causes the PDB reader to stop reading multimodel NMR-derived PDBs after the first model.
- merge checkin 31910 to JobDistributor.hh, or figure out checkin 35807 - these are supposed to reduce memory use in multiple-input runs, so that single-use input PDBs are not retained in memory.