You are here

Divide larger low-res. global run into several smaller runs?

2 posts / 0 new
Last post
Divide larger low-res. global run into several smaller runs?
#1

Can one break one large global low-resolution docking run into smaller runs using -run:constant_seed and -run:jran=######## and just assign different ####### seed to each run.

Specifically,  if I would like to generate 30,000 low-res. decoys.  Rather than do it as one docking run, and since I assume all decoys are based on random generator, why not break up the run into three seperate runs of 10,000 (three seperate processors) running simultaneously, each assigned a different seed?  Would this be equivalent to single 30,000 run to generate 30,000 decoys?   In fact, since I have access to over 1000 single processors, why not do 1000 runs specifying 30 decoy structures--guess I would have to come up with 1000 seeds.  Can the seeds be larger than 7 digits.  Probablyl better to spread the differences in seed size as much as possible, I assume.  

Obviously, the time savings would help.  But not sure if there are differences one should be aware of if choosing such a route.  Or maybe I'm completely missing something.

Thanks in Advance

J. Snyder

Category: 
Post Situation: 
Mon, 2015-12-21 17:41
jasnyderjr

This is exactly how we use Rosetta: many processors on different RNG seeds.  So, yes, this absolutely works.

I don't know what the seed size limits are, but I can tell you seed neighbors are irrelevant.  The map of "seed space" to "RNG behavior space" is, well, random - that's the point - so I always used seed, seed+1, seed+2, etc.  

Caveats:

1) Rosetta does not yet have a shared-memory model.  If one copy of Rosetta running uses 1 GB, two use 2, even if a lot of that memory is the same constant database data loaded twice.  You'll eventually hit diminising returns.  There is also some small overhead per-process that will make you lose a little bit of time.

2) To run on 30 processors, you need to either run in 30 different directories and merge the results yourself (you'll have dir1/result_0001.pdb, dir2/result_0001.pdb, etc), or use MPI.  The only purpose of the MPI job distributors in most circumstances is to let you run many processors writing to one directory.

3) Just a warning on the number 30,000: file systems start to become upset with that many files in one directory.  Splitting your directories, or using the "silent file" (https://www.rosettacommons.org/docs/latest/rosetta_basics/file_types/silent-file) system, will help.

4) sysadmins generally don't like jobs that schedule 1000 processors for an hour, instead of 100 processors for 10 hours each -  look into what your scheduler / sysadmin wants.

Tue, 2015-12-22 08:31
smlewis