|Rosetta 3.2 Release Manual|
New job distributor README It might make your life easier to read this in something with word-wrapping (a document editor, not a text editor)
A JobDistributorFactory class is responsible for determining what JD should exist, and creating that one, upon the first request (via the singleton interface) for the JD. JobDistributorFactory.cc therefore contains the logic which determines what JobDistributor class actually gets instantiated in any given invocation of an executeable. This is easy code to read, and by definition it is up to date, whereas this document might not be (so look at it if these instructions fail!)
One concrete class is FileSystemJobDistributor, which distributes jobs to a single processor. It can coordinate independent executeables by marking in-progress jobs in the filesystem. This distributor is compatible with running separate jobs in separate directories, or many instances of one command line (large nstruct) in the same directory, with the option -runmultiple_processes_writing_to_one_directory. This is the vanilla, default distributor.
ShuffleJobDistributor exists; this author is not sure of its purpose. I think it's for randomizing job run order; this is of value for its child BOINCJobDistributor. BOINC is used for the Rosetta project, setting up BOINC to use this distributor is beyond the scope of this documentation.
There are four MPI flavors to choose from.
1) MPIWorkPartitionJobDistributor: this one is MPI in name only. Activate it with jd2::mpi_work_partition_job_distributor. Here, each processor looks at the job list and determines which jobs are its own from how many other processors there are (in other words, divide number of jobs by number of processors; each processor does that many). Its files contain an example. This KD has the best potential efficiency with MPI but isn't useful for big stuff, because you have to load-balance yourself by setting it up so that (njobs % nprocs) is either zero or close to nprocs. It is recommended only for small jobs, or jobs with tightly controlled runtimes. I use it for generating a small number of trajectories (100 jobs on 100 processors) to determine where to set my filters, then run large jobs under the next jd.
2) MPIWorkPoolJobDistributor: this is the default in MPI mode. Here, one node does no Rosetta work and instead controls all the other nodes. All other nodes request jobs from the head node, do them, report, and then get a new one until all the jobs are done. If you only have one processor, then you get only a head node, and no work is done: make sure you're using MPI properly! This one has "blocking" output - only one job can write to disk at a time, to prevent silent files from being corrupted. It's a good choice if the runtime of each individual job is relatively large, so that output happens sparsely (because then the blocking is a non-issue). If your jobs are very short and write to disk often, or you have a huge number of processors and they write often just because there's a ton of them, this job distributor will be inefficient. As of this writing, this is the job distributor of choice when writing PDB files.
3) MPIFileBufJobDistributor: this is the other default for MPI. It allocates two nodes to non-rosetta jobs. One head node as before and one node dedicated to managing output. It is nonblocking, so it is the best choice if job completion occurs often relative to how long filesystem writes take (short job times or huge numbers of processors or both). At the moment it works only with silent file output. It is the best choice if you have short job times or large (many thousands) of processors. (Think abinitio)
4) There's another one called MPIArchiveJobDistributor, maybe Oliver will add documentation for it.
Batching - It exists! I don't know how to use it! Maybe Oliver will add documentation for it.
JobInputter is responsible for determining what jobs can exist, and for creating poses from Job objects. The simplest class is PDBJobInputter, which handles PDBs or gzipped PDBs from disk. It determines what jobs ought to exist from -s/-l and -nstruct. Other JobInputters exist, one based on silent files and one which does not have starting structures for abinitio. (Threading also exists, I don't know what it does?)
JobOutputter is responsible for outputting jobs' results. This class interfaces with both the JobDistributor and actual protocols. Its three responsibilities are A) determining what a job's name is and/or where its results are stored, B) if the job has completed data (left over from a previous run, or from another concurrent execution) already stored and thus should not be reattempted, and C) handling the storage of completed protocols' results (poses and other data).
JobOutputter classes will output scored poses and scorefiles as you might want. However, these classes do NOT know or care about scorefunctions, they print energies based only on things stored in the Pose (thus, the energies object). YOUR MOVER is responsible for preparing these score repositiories. The scores stored in the Energies object (whatever was put there last time the pose was scored) can be printed. Additional scores & data can be printed via the Job object (below).
The third layer is output data. This “accessory data” is extra stuff you want dumped at the end of the PDB like in Rosetta++. Anything you load into the Job will get attached to the final pose in some fashion (in the silent file, PDB, scorefile, etc). If you are printing intermediate poses, they will also have this data attached; you can change that by making a temporary Job object for the intermediate pose (the JobDistributor will ignore it, this is a local object), filling it, and handing it out to the JobOutputter (see example in jd2test.cc). The functions for adding extra data are Job->add_string(), add_string_string_pair, and add_string_real_pair()
b) Movers can automatically filter their results, and tell the job distributor “well, this run was a waste, don't increment the job count and try again”. Activate filtering through Mover class's set_last_move_status function; this must be set in the outermost Mover that the job distributor is running directly.
The Mover interface allows for one and only one input per trajectory: a starting Pose. This is a limitation of the Mover interface. Altering other data per-trajectory requires using non-base-class Mover functions (and is thus something the JD2 can't do.)
The workaround is to subclass the JobInputter. One of the major purposes for its OO-modularity is to make it easy to subclass. Subclass JobInputter to my_special_protocol_JobInputter, and have it organize your extra data.
Where does the extra data go? One place would be the Pose itself, in the DataCache. Another choice would be to also subclass the Job object and put it there. Either way you're pointer casting to extract the data.