Divide larger low-res. global run into several smaller runs?

2 posts / 0 new

Top

Can one break one large global low-resolution docking run into smaller runs using -run:constant_seed and -run:jran=######## and just assign different ####### seed to each run.

Specifically, if I would like to generate 30,000 low-res. decoys. Rather than do it as one docking run, and since I assume all decoys are based on random generator, why not break up the run into three seperate runs of 10,000 (three seperate processors) running simultaneously, each assigned a different seed? Would this be equivalent to single 30,000 run to generate 30,000 decoys? In fact, since I have access to over 1000 single processors, why not do 1000 runs specifying 30 decoy structures--guess I would have to come up with 1000 seeds. Can the seeds be larger than 7 digits. Probablyl better to spread the differences in seed size as much as possible, I assume.

Obviously, the time savings would help. But not sure if there are differences one should be aware of if choosing such a route. Or maybe I'm completely missing something.

Thanks in Advance

J. Snyder

Category:

Docking

Post Situation:

Unsolved

Mon, 2015-12-21 17:41

jasnyderjr

Top

This is exactly how we use Rosetta: many processors on different RNG seeds. So, yes, this absolutely works.

I don't know what the seed size limits are, but I can tell you seed neighbors are irrelevant. The map of "seed space" to "RNG behavior space" is, well, random - that's the point - so I always used seed, seed+1, seed+2, etc.

Caveats:

1) Rosetta does not yet have a shared-memory model. If one copy of Rosetta running uses 1 GB, two use 2, even if a lot of that memory is the same constant database data loaded twice. You'll eventually hit diminising returns. There is also some small overhead per-process that will make you lose a little bit of time.

2) To run on 30 processors, you need to either run in 30 different directories and merge the results yourself (you'll have dir1/result_0001.pdb, dir2/result_0001.pdb, etc), or use MPI. The only purpose of the MPI job distributors in most circumstances is to let you run many processors writing to one directory.

3) Just a warning on the number 30,000: file systems start to become upset with that many files in one directory. Splitting your directories, or using the "silent file" (https://www.rosettacommons.org/docs/latest/rosetta_basics/file_types/silent-file) system, will help.

4) sysadmins generally don't like jobs that schedule 1000 processors for an hour, instead of 100 processors for 10 hours each - look into what your scheduler / sysadmin wants.

Tue, 2015-12-22 08:31

smlewis

Search form

You are here

Divide larger low-res. global run into several smaller runs?