You are here

extraction of pdbs from silent.out files on Mac

6 posts / 0 new
Last post
extraction of pdbs from silent.out files on Mac
#1

Hi RosettaCommons support group,

I have generated silent.out files using Rosetta 3.5 version to build homology models. I tried to extract the pdbs using the following script:
~/rosetta-3.5/rosetta_source/bin/extract_pdbs.macosgccrelease -database ~/rosetta-3.5/rosetta_database/ -in:file:silent silent.out -in:file:silent_struct_type binary -out:file:residue_type_set fa_standard, but could not and the error is as follows:

core.init: Mini-Rosetta version Split from developer trunk at 53488 from http://www.rosettacommons.org
core.init: command: /Users/pramod/rosetta-3.5/rosetta_source/bin/extract_pdbs.macosgccrelease -database /Users/pramod/rosetta-3.5/rosetta_database/ -in:file:silent silent.out -out:file:residue_type_set fa_standard -out:file:silent_struct_type binary
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-455986711 seed_offset=0 real_seed=-455986711
core.init.random: RandomGenerator:init: Normal mode, seed=-455986711 RG_type=mt19937
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 6242 residue types
core.io.silent: Reading all structures from silent.out
core.io.silent: parse error( 1 L 0.000 -47.102 172.784 -23.724 18.974 60.984 -64.767 -178.209 77.574 0.000 2ou0_threaded_0005) 2ou0_threaded_0005 != empty_tag
core.io.silent: ERROR: did not find coordinates for all sequence positions for empty_tag
core.io.silent: Couldn't read position 2
core.io.silent: Couldn't read position 3
core.io.silent: Couldn't read position 4

I am attaching the silent.out file for your convenience.
Please suggest to fix this issue
Thanks
Pramod

Post Situation: 
Tue, 2014-01-07 11:43
pramod

First off, the command you think you're running and the command you're actually running (from the given output) are different, with the differences being "-out:file:silent_struct_type binary" versus "-in:file:silent_struct_type binary". Neither of which are you likely to want in this situation. If you're trying to get PDB files, the "-out:file:silent_struct_type binary" might cause you to get a binary silent file instead of PDBs. The "-in:file:silent_struct_type binary" isn't what you want either, as it looks like you have a "protein" format silent file. (Just leave off -in:file:silent_struct_type altogether -- Rosetta can autodetermine if a silent file is protein versus binary.)

Neither of these is causing your issue, though. I can't say for certain, but I'm guessing it's a formatting issue with your silent file. Unfortunately, the silent file didn't attach, so I can't check that. Did you have issues with the size of the file? If so, you can always put the file on an external file hosting service (e.g. Google Drive or Dropbox) and post the public URL here.

Tue, 2014-01-07 12:39
rmoretti

Hi rmoretti,

Thanks!!

As suggested I have posted the link below for the silent.out file.
https://drive.google.com/file/d/0B7ZBhMEQH7fqZHhqMFVsUUdUbW8/edit?usp=sh...

Could please look into it and help me fix the issue?

Pramod

Tue, 2014-01-07 13:51
pramod

That's interesting - it looks like you have a silent file which is a mixture of structure from different sources (one in binary format and one in protein format). Where did you get the silent file from? This sort of thing can happen when you merge two different silent files, or if you have two processes simultaneously writing to the same silent file, but each with a different format.

Normally you should be able merge two silent files, but there are some caveats you need to be wary of. One is that all structures in the silent file should be he same length - either the same sequence, or closely related sequences (like mutants). Another is that there's only a single score header line, and the silent file format assumes that all of the SCORE lines follow the same header.

The latter is causing you problems. You have 25 score fields for the binary format items, and 27 for the protein format ones. Unfortunately, you start out with the smaller-field'ed binary format, so when you read in the larger protein format one, you choke.

To fix it, you can simply add two (space separated) headings to the header SCORE line (the second line in the silent file). E.g. instead of
SCORE: score fa_atr fa_rep ...
make it read:
SCORE: score void1 void2 fa_atr fa_rep...

This will mess up the correlations of names to scores, but for extract_pdbs it shouldn't matter. (But be aware of the limitation if using it for other applications.)

Tue, 2014-01-07 16:46
rmoretti

Thanks for the suggestion, it worked.
This silent.out file was from a server on campus and I used mpi version of Rosetta3.5 to obtain it. I used the script (below) to obtain the silent.out file.

#PBS -S /bin/bash
#PBS -o aaronout.txt
#PBS -e aaronerr.txt
#PBS -l nodes=2:ppn=8
#PBS -l walltime=536:00:00

#cat $PBS_NODEFILE
NP=`wc -l < $PBS_NODEFILE`
val=4
#cd $PBS_O_WORKDIR
#cd ~/atest/

/opt/openmpi/bin/mpirun -np $NP /share/apps/rosetta-mpi-3.5/rosetta_source/bin/loopmodel.mpi.linuxgccrelease -database /share/apps/rosetta-mpi-3.5/rosetta_database/ -loops:remodel quick_ccd -loops:refine refine_kic -in:file:s /home/pramod.akula.bala/rosettaprotocol/2ou0_threaded.pdb -in:file:fullatom -loops:loop_file /home/pramod.akula.bala/rosettaprotocol/2ou0_.loops -loops:frag_files /home/pramod.akula.bala/rosettaprotocol/aa2ou0_09_05.200_v1_3 /home/pramod.akula.bala/rosettaprotocol/aa2ou0_03_05.200_v1_3 none -nstruct 20 -ex1 -ex2 -overwrite -loops:extended true -loops:idealize_after_loop_close -loops:relax fastrelax -loops:fast -out:file:silent /home/pramod.akula.bala/rosettaprotocol/silent.out
~

I was wondering is it the script that I am using causing such file formats?

Wed, 2014-01-08 10:15
pramod

I'm not seeing anything obviously off. MPI can be a bit fiddly sometimes, so I might double check to make sure that this is the correct invocation to result in a single single run across 16 processors, as opposed to having, for example, the processors on the two nodes not talking to each other. (So you get something like two runs of 8.) -- I don't think that would result in the problem you saw, though, so chances are it's fine.

I'm really not sure why you have the issue. Perhaps if you explicitly set the output format ("-out:file:silent_struct_type binary") you'll get better consistency in the future, as the differences in the two looked to be split on protein/binary file format lines.

===

Two unrelated things I noticed:

You have "-overwrite" option on the command line, so if you restarted the run (e.g. if the program crashes or gets killed by the cluster), Rosetta will discard and redo all the structures you've already completed and saved to the output file. Removing that option will cause you to keep all the completed structures written to disk, and only (re-)do the ones which you have missed.

Also, you're distributing 20 structures across 16 processors. Given that one of the MPI processors gets taken up as the master input/output node and doesn't process structures, and assuming that all the output structures take about the same amount of time, you're going to have 10 processors idling while the last 5 structures are being generated. This might not be an issue for you (it isn't going to affect the scientific results), but just something to be aware of.

Wed, 2014-01-08 12:54
rmoretti