You are here

extract_pdb application parse error with silent file

6 posts / 0 new
Last post
extract_pdb application parse error with silent file
#1

Hi,

I am getting the following error message when I try to extract pdb files from a silent file (output from ligand docking using RosettaScripts). 

core.io.silent.SilentFileData: Reading 1 structures from silent.out
core.io.silent: parse error:  found 0 RT lines for a fold-tree with 2 for decoy tag complex_mat1_0016
caught exception failure to read decoy complex_mat1_0016 from silent-file silent.out

I'm using the following options: "-in:file:silent silent.out -in:file:tags complex_mat1_0001 -extra_res_fa a.params b.params"

When I looked inside the silent file, I do see that the lines that start with "RT ..." is missing for complex_mat1_0016. First of all, I do not know what this means and why some structures (0016 happened to be the first one, but there are more following that are missing the RT lines) are missing these RT lines. 

In the end, this file is not what I am looking to extract the pdb of, so if there is a way to bypass validation of every structure in the silent file, that would be helpful, too.

Thank you in advance!

EDIT: I still haven't figured out what the problem is yet, but I wanted to provide some more information for anyone who could try to help me out. 

1. I've updated rosetta with the latest weekly update, and I still get the same parse error on these silent files. (Now there is a nice "[WARNING]" in front of parse error with the update)

2. All my silent files were prematurely terminated due to a timeout error on the cluser I am using. But I don't think this is the problem, because I went back to a silent file from before, which had no trouble being processed by extract_pdbs, and now it's throwing the same error where there are some structures with only one RT line and extract_pdbs application doesn't like it.

3. As alluded from the previous point, the extract_pdbs application (from the same old version) used to work just fine last time I used Rosetta, which was about 3 months ago. One of the biggest changes I made between then and now was that I changed the directory of rosetta, but I've rebuilt all the applications numerous times since then.

4. When I manipulated the silent file and got rid of all the lines following the 0016 structure, extract_pdbs application worked just fine and outputted a pdb.

Ultimately, I would like to get down to the root of the problem, as some structures that I would like to extract pdbs of do have only one RT line...

Could someone give me a pointer at least by explaining what the significance of these RT lines is? Thank you!!!

Category: 
Post Situation: 
Tue, 2017-08-22 12:39
staciekim

The early truncation of the runs shouldn't matter - it might mess up the very last structure if you were particularly unlucky in when the run was killed, but the other structures should come through okay (barring other issues.)

The "quick fix" is to see if the -silent_read_through_errors option allow you to read the structures you're interested.

I'm a little concerned about the fact the 0016 structure was the first one in the silent file. How did you create the silent file? Specifically, did you have multiple processes all writing to the same silent file at the same time? That can be dangerous, as the silent file format is not robust to interleaved lines. (That is, if two processes try to write structures to the silent file at the same time, the lines from one structure can get mixed up with the lines from the other structure. This is particularly likely if Rosetta outputs structures freqeuntly, like it does with ligand docking.)  If you have multiple Rosetta processes going at the same time, they either need to be writing to separate silent files, or they need to coordinate via MPI.

If this is your issue, there's a somewhat tedious workaround. First, I'd do a `grep SCORE silent.out > scorefile` to get scorefile lines. Using this scorefile, you can pick out the particular structures you're interested in extracting. (I do not suggest extracting all of them.)  Then for each structure tag you want, you can run the following.

head -n 3 silent.out > TAG.out
grep TAG silent.out >> TAG.out

This should give you a silent file for each tag you want to extract. You can then use extract_pdbs on each of them to pull out the PDB formatted structure.

 

If that's not the issue, then I'd probably need to see the silent file in order to do a better diagnostic.

 

By the way, the RT line is a "rigid transformation" line. That is, it encodes information about how the ligand is oriented with respect to the protein. It's slightly redundant to the coordinate information which is encoded in a binary-format silent file, but it's needed to properly set up these sorts of multi-chain structures.

Wed, 2017-08-23 12:01
rmoretti

Hi rmoretti,

I tried uploading my silent file but it may have been too big. I will post my reply first and then upload a silent file (maybe after truncating 80% of the structures)

Sorry about not being clear, but I meant that the structure 0016 was the first one with zero RT lines. As you will see in the silent file I will be uploading, it's a mixed bag of "good" structures with two RT lines and "bad" ones with 0 or 1 RT lines. My structure does have two ligands, so I am not quite sure why the bad structures were generated in the first place. (The two ligands are NAD and my substrate, and I only move around my substrate (chain X) during docking.) 

I tried the quick fix, but unfortunately a new error was thrown. Maybe this one's easier to troubleshoot?, if you could please take a look below:

core.io.silent.SilentFileData: Reading 1 structures from silent.out
core.io.silent: [ WARNING ] parse error:  found 0 RT lines for a fold-tree with 2 for decoy tag complex_mat1_0016
core.kinematics.FoldTree: [ ERROR ] no fold_tree info in this stream.
core.io.silent: [ WARNING ] parse error:  found 1 RT lines for a fold-tree with 2 for decoy tag complex_mat1_0022
core.io.silent: [ WARNING ] parse error:  found 1 RT lines for a fold-tree with 2 for decoy tag complex_mat1_0031
core.kinematics.FoldTree: [ ERROR ] no fold_tree info in this stream.
core.io.silent: [ WARNING ] parse error:  found 0 RT lines for a fold-tree with 2 for decoy tag complex_mat1_0070
core.io.silent: [ WARNING ] parse error:  found 1 RT lines for a fold-tree with 2 for decoy tag complex_mat1_0076
core.io.silent: [ WARNING ] parse error:  found 0 RT lines for a fold-tree with 1 for decoy tag complex_mat1_0079
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_lexical_cast> >: bad lexical cast: source type value could not be interpreted as target
Abort trap: 6

Thank you!

EDIT: So my silent file is too big to be uploaded, and if I cut out the structures to fit the size limit, it'll cut out the structure 0016... Is there a better way to share a file that is ~2MB (after reasonable truncation)?

EDIT2: Maybe try the following link?

https://drive.google.com/file/d/0Bw73lC7XQEV_TVRCcHdXSXU1Z1U/view?usp=sharing

Wed, 2017-08-23 12:57
staciekim

Okay, I'm not quite sure why it's happening, but for every structure you have there's a rather long string that's being inserted in the middle of your structure output, replacing what should normally go there. For many of the structures this is somewhat "invisible" as it happens in the middle of the coordinate section and thus doesn't appear (yet).

The problem you're seeing comes when the replacement occurs during the non-coordinate section. For complex_mat1_0016, for example, the insertion happens toward the end of the FOLD_TREE line, and overwrites several lines after it (including the RT lines), not leaving off again until you're in the middle of the coordinate section.

 

This sort of damage is not recoverable -- your silent file is irrevokably corrupted, even for the other structures which are not yet giving you errors.

 

I'm not quite sure why you're getting this rather consistent replacement.

If you've done something like copy the silent file from a remote cluster system, the first thing I'd recommend is going back to the original silent file (if you still have it) and see if that has the same contents, or if there might have been an issue with the file transfer. (I won't paste the whole replacement string here, but it begins with "R2OnC0JxgZQPffIB5P1" and ends with "QXI1DBRI2aFUJhLaQ" - if the original file has those strings, it's also corrupted.)

The next thing I'd recommend is to redo your docking run, using the same settings as before. If it was a random one-off then the redo shouldn't have that corruption.

If you're getting the same (or similar) corruption in the re-run, then we need to try to track down if it's an issue with your system (e.g. a flakey hard drive) or if you're somehow triggering a bizzare bug in Rosetta silent file output.

P.S. While you're waiting for Rosetta to run, it might be worth searching (e.g. with the grep utility) for the string "R2OnC0JxgZQPffIB5P1" across all your files (even the non-Rosetta-related ones), to see if it's just limited to Rosetta silent files, or if there's a more general filesystem corruption going on.

 

Wed, 2017-08-23 16:03
rmoretti

Hi rmoretti,

Thank you for your response. I had been looking around on my own and came down to a similar conclusion.

1) The files in the remote cluster, as they were generated, were already corrupted. I did start another run, thinking that there was something wrong with the way I set up the batch job submission. The new silent files from this run resulted in a "!in_bracket" error, which was coming from incomplete ANNOTATED_SEQUENCEs appearing in repeated decoy outputs (the whole annotation for 0001 was repeated with corrupted ANNOTATED_SEQUENCEs). 

 2) I did find the long, random string you were referring to in all silent files. Since these silent files are generated in the remote cluster, I don't think there's anything wrong with my local computer, but I guess that means it's going to be even harder to troubleshoot this problem on my own. 

I am not sure what I can do from here to troubleshoot. I might go with the brute-forcing method of outputting a pdb file instead of a silent file just to get these runs going. Please let me know if you have any other suggestions to troubleshoot this problem.

Thank you!

Wed, 2017-08-23 16:56
staciekim

One of the first steps I'd recommend in trouble shooting is attempting to recompile Rosetta (the one you're using to produce the silent files). It's a bit of a long shot, but it's theoretically possible that the Rosetta compilation got corrupted in such a way that it's producing garbage in the silent files.

To do a clean recompilation, you would want to delete everything under the `Rosetta/main/source/bin/`and `Rosetta/main/source/build/` directories, and then re-run the scons command: https://www.rosettacommons.org/docs/latest/build_documentation/Build-Documentation   Then I'd test to see if you get the same issue if you re-run with the fresh compilation.

Thu, 2017-08-24 08:19
rmoretti