![]() |
Rosetta Protocols
2014.35
|
JobDistributor for the iterative ArchiveManager/Archive Framework. More...
#include <MPIArchiveJobDistributor.hh>
Classes | |
struct | CompletionMessage |
CompletionMessage(s) are send to the ArchiveManager whenever more than nr_notify decoys have been finished / or when the full batch is finished. More... | |
Public Member Functions | |
virtual void | go (protocols::moves::MoverOP mover) |
overloaded to also start the ArchiveManager process More... | |
void | set_archive (archive::ArchiveBaseOP) |
bool | is_archive_rank () const |
![]() | |
virtual | ~MPIFileBufJobDistributor () |
dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt More... | |
core::Size | increment_client_rank () |
core::Size | min_client_rank () const |
return rank of first worker process (there might be more dedicated processes, e.g., ArchiveManager...) More... | |
virtual core::Size | get_new_job_id () |
dummy for master/slave version More... | |
virtual void | mark_current_job_id_for_repetition () |
dummy for master/slave version More... | |
virtual void | remove_bad_inputs_from_job_list () |
dummy for master/slave version More... | |
virtual void | job_succeeded (core::pose::Pose &pose, core::Real runtime, std::string const &tag) |
dummy for master/slave version More... | |
virtual void | job_failed (core::pose::Pose &pose, bool will_retry) |
This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter. More... | |
![]() | |
virtual | ~JobDistributor () |
void | go (protocols::moves::MoverOP mover, JobOutputterOP jo) |
invokes go, after setting JobOutputter More... | |
JobOP | current_job () const |
Movers may ask their controlling job distributor for information about the current job. They may also load information into this job for later output. More... | |
std::string | current_output_name () const |
Movers may ask their controlling job distributor for the output name as defined by the Job and JobOutputter. More... | |
JobOutputterOP | job_outputter () const |
Movers (or derived classes) may ask for the JobOutputter. More... | |
void | set_job_outputter (const JobOutputterOP &new_job_outputter) |
Movers (or derived classes) may ask for the JobOutputter. More... | |
JobInputterOP | job_inputter () const |
JobInputter access. More... | |
virtual void | mpi_finalize (bool finalize) |
should the go() function call MPI_finalize()? It probably should, this is true by default. More... | |
JobInputterInputSource::Enum | job_inputter_input_source () const |
The input source for the current JobInputter. More... | |
virtual void | restart () |
core::Size | total_nr_jobs () const |
core::Size | current_job_id () const |
integer access - which job are we on? More... | |
std::string | get_current_batch () const |
what is the current batch ? — name refers to the flag-file used for this batch More... | |
virtual void | add_batch (std::string const &, core::Size id=0) |
add a new batch ( name will be interpreted as flag_file ) More... | |
core::Size | current_batch_id () const |
what is the current batch number ? — refers to position in batches_ More... | |
Protected Types | |
typedef MPIFileBufJobDistributor | Parent |
Protected Member Functions | |
MPIArchiveJobDistributor () | |
ctor is protected; singleton pattern More... | |
virtual void | handle_interrupt () |
This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn. More... | |
virtual void | batch_underflow () |
triggered in slave if new batch_ID comes in. More... | |
virtual bool | process_message (core::Size msg_tag, core::Size slave_rank, core::Size slave_job_id, core::Size slave_batch_id, core::Real run_time) |
act on a message, return true if message was understood More... | |
virtual void | mark_job_as_completed (core::Size job_id, core::Size batch_id, core::Real run_time) |
overloaded to allow statistics and sending of CompletionMessages More... | |
virtual void | mark_job_as_bad (core::Size job_id, core::Size batch_id) |
overloaded to allow statistics and sending of CompletionMessages More... | |
virtual void | load_new_batch () |
overloaded to start new entries in nr_new_completed_, nr_completed_, nstruct_ and nr_bad_ ... More... | |
core::Size | archive_rank () const |
rank of ArchiveManger process More... | |
![]() | |
MPIFileBufJobDistributor () | |
ctor is protected; singleton pattern More... | |
MPIFileBufJobDistributor (core::Size master_rank, core::Size file_buf_rank, core::Size min_client_rank, bool start_empty=false) | |
protected ctor for child-classes More... | |
virtual bool | next_batch () |
switch current_batch_id_ to next batch More... | |
void | master_go (protocols::moves::MoverOP mover) |
Handles the receiving of job requests and the sending of job ids to and from slaves. More... | |
core::Size | master_get_new_job_id () |
Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags. More... | |
core::Size | slave_get_new_job_id () |
requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true More... | |
void | master_mark_current_job_id_for_repetition () |
This should never be called as this is handled internally by the slave nodes, it utility_exits. More... | |
void | slave_mark_current_job_id_for_repetition () |
Sets the repeat_job_ flag to true. More... | |
void | master_remove_bad_inputs_from_job_list () |
Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input. More... | |
void | slave_remove_bad_inputs_from_job_list () |
Sends a message to the head node that contains the id of a job that had bad input. More... | |
void | master_job_succeeded (core::pose::Pose &pose, std::string const &tag) |
This should never be called as this is handled internally by the slave nodes, it utility_exits. More... | |
void | slave_job_succeeded (core::pose::Pose &pose, std::string const &tag) |
Sends a message to the head node upon successful job completion to avoid output interleaving. More... | |
void | slave_to_master (core::Size tag) |
send a message to master More... | |
void | send_job_to_slave (core::Size slave_rank) |
called by master to send and by slave to receive job More... | |
core::Size | rank () const |
return rank of this process More... | |
core::Size | master_rank () const |
return rank of master process ( where JobDistributor is running ) More... | |
core::Size | file_buf_rank () const |
return rank of file-buffer process ( where output data (via ozstream )is handled ) More... | |
core::Size | number_of_processors () |
how many processes — this includes dedicated processes More... | |
core::Size | n_rank () |
how many processes — this includes dedicated processes More... | |
core::Size | n_worker () |
how many workers — important to keep track during spin-down process More... | |
void | set_n_worker (core::Size setting) |
how many workers — important to keep track during spin-down process More... | |
void | eat_signal (core::Size signal, int source) |
receive a certain signal and ignore it.... this is needed, for instance, when MPIArchiveJobDistributor triggers an ADD_BATCH signal by sending QUEUE_EMPTY to the ArchiveManager... More... | |
![]() | |
JobDistributor () | |
Singleton instantiation pattern; Derived classes will call default ctor, but their ctors, too must be protected (and the JDFactory must be their friend.) More... | |
JobDistributor (bool empty) | |
MPIArchiveJobDistributor starts with an empty job-list... More... | |
void | go_main (protocols::moves::MoverOP mover) |
Non-virtual get-job, run it, & output loop. This function is pretty generic and your subclass may be able to use it. It is NOT virtual - this implementation can be shared by (at least) the simple FileSystemJobDistributor, the MPIWorkPoolJobDistributor, and the MPIWorkPartitionJobDistributor. Do not feel that you need to use it as-is in your class - but DO plan on implementing all its functionality! More... | |
Jobs const & | get_jobs () const |
Read access to private data for derived classes. More... | |
void | mark_job_as_completed (core::Size job_id, core::Real run_time) |
Jobs is the container of Job objects need non-const to mark Jobs as completed on Master in MPI-JobDistributor. More... | |
void | mark_job_as_bad (core::Size job_id) |
ParserOP | parser () const |
Parser access. More... | |
void | begin_critical_section () |
void | end_critical_section () |
bool | obtain_new_job (bool re_consider_current_job=false) |
this function updates the current_job_id_ and current_job_ fields. The boolean return states whether or not a new job was obtained (if false, quit distributing!) More... | |
virtual void | job_succeeded_additional_output (core::pose::Pose &pose, std::string const &tag) |
This function is called upon a successful job completion if there are additional poses generated by the mover base implementation is just a call to the job outputter. More... | |
virtual void | current_job_finished () |
Derived classes are allowed to clean up any temporary files or data relating to the current job after the current job has completed. Called inside go_main loop. Default implementation is a no-op. More... | |
virtual void | note_all_jobs_finished () |
Derived classes are allowed to perform some kind of action when the job distributor runs out of jobs to execute. Called inside go_main. Default implementation is a no-op. More... | |
void | clear_current_job_output () |
void | set_batch_id (core::Size setting) |
set current_batch_id — eg for slave nodes in MPI framework More... | |
core::Size | nr_batches () const |
how many batches are in our list ... this can change dynamically More... | |
std::string const & | batch (core::Size batch_id) |
give name of batch with given id More... | |
Private Member Functions | |
void | _notify_archive () |
stuff needed for non-blocking communication More... | |
bool | receive_batch (core::Size source_rank) |
receive a new Batch from ArchiveManager More... | |
void | sync_batches (core::Size slave_rank) |
sync batch queue with slave node More... | |
void | master_to_archive (core::Size tag) |
send message to ArchiveManager More... | |
void | notify_archive (CompletionMessage const &) |
add a notifcation (CompletionMessage) to the msg queue ... More... | |
void | notify_archive (core::Size batch_id) |
work out if a notifcation should be send (using above method) More... | |
Private Attributes | |
utility::vector1< core::Size > | nr_jobs_ |
utility::vector1< core::Size > | nr_completed_ |
utility::vector1< core::Size > | nr_new_completed_ |
utility::vector1< core::Size > | nr_bad_ |
utility::vector1< core::Size > | nstruct_ |
core::Size | nr_notify_ |
core::Size | archive_rank_ |
std::deque< CompletionMessage > | pending_notifications_ |
unsent notifications More... | |
ArchiveBaseOP | theArchive_ |
Friends | |
class | protocols::jd2::JobDistributorFactory |
Additional Inherited Members | |
![]() | |
static JobDistributor * | get_instance () |
static function to get the instance of ( pointer to) this singleton class More... | |
![]() | |
static void | setup_system_signal_handler (void(*prev_fn)(int)=jd2_signal_handler) |
Setting up callback function that will be call when our process is about to terminate. More... | |
static void | remove_system_signal_handler () |
Set signal handler back to default state. More... | |
static void | jd2_signal_handler (int Signal) |
Default callback function for signal handling. More... | |
JobDistributor for the iterative ArchiveManager/Archive Framework.
Tags used to tag messeges sent by MPI functions used to decide whether a slave is requesting a new job id or flagging as job as being a bad input
This job distributor is meant for running iterative jobs with the ArchiveManager/Archive Framework. could vary greatly. In this configuration the three first nodes are dedicated processes (JobDistributor, FileBuffer, and ArchiveManger ) and the remaining CPUs form slave or worker nodes. This JD will not work at all without MPI and the implementations of all but the interface functions have been put inside of ifdef directives. Generally each function has a master and slave version, and the interface functions call one or the other depending on processor rank.
|
protected |
|
protected |
ctor is protected; singleton pattern
constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.
References archive, completion_notify_frequency, nr_notify_, and option.
|
private |
stuff needed for non-blocking communication
the private implementation of notify_archive. send JOB_COMPLETION message to Archive if a message is in the message queue.
References archive_rank(), protocols::jd2::archive::MPI_ARCHIVE_TAG, basic::MPI_NOTIFY_ARCHIVE, pending_notifications_, PROF_START, PROF_STOP, basic::show_time(), and protocols::jd2::tr.
Referenced by batch_underflow(), and process_message().
|
inlineprotected |
rank of ArchiveManger process
References archive_rank_.
Referenced by _notify_archive(), batch_underflow(), go(), is_archive_rank(), master_to_archive(), process_message(), and set_archive().
|
protectedvirtual |
triggered in slave if new batch_ID comes in.
called if JD is at and of BatchQueue... for a worker node that might mean he needs to sync batches with Master for a master node it means he sends QUEUE-EMPTY to ArchiveManager
Reimplemented from protocols::jd2::JobDistributor.
References _notify_archive(), protocols::jd2::archive::ADD_BATCH, archive_rank(), protocols::jd2::archive::BATCH_SYNC, protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::MPIFileBufJobDistributor::eat_signal(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), master_to_archive(), basic::MPI_JD2_WAITS_FOR_ARCHIVE, PROF_START, PROF_STOP, protocols::jd2::archive::QUEUE_EMPTY, protocols::jd2::MPIFileBufJobDistributor::rank(), receive_batch(), basic::show_time(), protocols::jd2::MPIFileBufJobDistributor::slave_to_master(), sync_batches(), and protocols::jd2::tr.
|
virtual |
overloaded to also start the ArchiveManager process
dummy for master/slave version – start the appropriate process depending on rank()
Reimplemented from protocols::jd2::MPIFileBufJobDistributor.
References archive, archive_rank(), utility::io::ozstream::enable_MPI_reroute(), protocols::jd2::MPIFileBufJobDistributor::file_buf_rank(), protocols::jd2::archive::ArchiveManager::go(), protocols::jd2::JobDistributor::go_main(), protocols::jd2::MPIFileBufJobDistributor::master_go(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), mem_tr, protocols::jd2::MPIFileBufJobDistributor::min_client_rank(), protocols::jd2::MPIFileBufJobDistributor::rank(), protocols::jd2::MpiFileBuffer::run(), runtime_assert, protocols::jd2::MpiFileBuffer::stop(), theArchive_, and protocols::jd2::tr.
|
inlineprotectedvirtual |
This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn.
Reimplemented from protocols::jd2::MPIFileBufJobDistributor.
|
inline |
References archive_rank(), and protocols::jd2::MPIFileBufJobDistributor::rank().
Referenced by protocols::abinitio::Broker_main().
|
protectedvirtual |
overloaded to start new entries in nr_new_completed_, nr_completed_, nstruct_ and nr_bad_ ...
load new batch from BatchQueue .. overloaded to setup the statistics for CompletionMessages
Reimplemented from protocols::jd2::JobDistributor.
References protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::load_new_batch(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), mem_tr, nr_bad_, nr_completed_, nr_jobs_, nr_new_completed_, out::nstruct, nstruct_, option, protocols::jd2::MPIFileBufJobDistributor::rank(), runtime_assert, and size().
|
protectedvirtual |
overloaded to allow statistics and sending of CompletionMessages
Reimplemented from protocols::jd2::MPIFileBufJobDistributor.
References protocols::jd2::MPIFileBufJobDistributor::mark_job_as_bad(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), notify_archive(), nr_bad_, nr_jobs_, nstruct_, protocols::jd2::MPIFileBufJobDistributor::rank(), and runtime_assert.
|
protectedvirtual |
overloaded to allow statistics and sending of CompletionMessages
overloaded to update our job-statistics ( needed for CompletionMessages )
Reimplemented from protocols::jd2::MPIFileBufJobDistributor.
References protocols::jd2::MPIFileBufJobDistributor::mark_job_as_completed(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), notify_archive(), nr_jobs_, nr_new_completed_, protocols::jd2::MPIFileBufJobDistributor::rank(), runtime_assert, and protocols::jd2::tr.
|
private |
send message to ArchiveManager
send message to ArchiveManager .. eg. QueueEmpty always send current_batch_id with the message ... used to determine if QueueEmpty is outdated
References archive_rank(), protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), protocols::jd2::archive::MPI_ARCHIVE_TAG, MPI_ONLY, protocols::jd2::MPIFileBufJobDistributor::rank(), runtime_assert, and tag.
Referenced by batch_underflow().
|
private |
add a notifcation (CompletionMessage) to the msg queue ...
queue up a CompletionMessage
References protocols::jd2::archive::MPIArchiveJobDistributor::CompletionMessage::batch_id, protocols::jd2::archive::MPIArchiveJobDistributor::CompletionMessage::msg_tag, pending_notifications_, and protocols::jd2::tr.
Referenced by mark_job_as_bad(), mark_job_as_completed(), and notify_archive().
|
private |
work out if a notifcation should be send (using above method)
work out if CompletionMessage should be send... looks at completed/bad decoys send "final" message if all jobs done... sends "update" message if nr_new_completed_ > nr_notify
References protocols::jd2::JobDistributor::current_job_id(), protocols::jd2::JobDistributor::get_jobs(), notify_archive(), nr_bad_, protocols::jd2::JobDistributor::nr_batches(), nr_completed_, nr_jobs_, nr_new_completed_, nr_notify_, protocols::jd2::MPIFileBufJobDistributor::number_of_processors(), size(), and protocols::jd2::tr.
|
protectedvirtual |
act on a message, return true if message was understood
process messages... BATCH_SYNC, ADD_BATCH, or delegate to Parent class also send pending CompletionMessages out...
Reimplemented from protocols::jd2::MPIFileBufJobDistributor.
References _notify_archive(), protocols::jd2::archive::ADD_BATCH, archive_rank(), protocols::jd2::archive::BATCH_SYNC, protocols::jd2::archive::CANCEL_BATCH, protocols::jd2::JobDistributor::get_current_batch(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), protocols::jd2::MPIFileBufJobDistributor::process_message(), protocols::jd2::MPIFileBufJobDistributor::rank(), receive_batch(), runtime_assert, sync_batches(), and protocols::jd2::tr.
|
private |
receive a new Batch from ArchiveManager
receive a new batch from ArchiveManager – interpret batch_nr == 0 as STOP
References protocols::jd2::JobDistributor::add_batch(), protocols::jd2::MPI_JOB_DIST_TAG(), MPI_ONLY, basic::prof_show(), size(), and protocols::jd2::tr.
Referenced by batch_underflow(), process_message(), and sync_batches().
void protocols::jd2::archive::MPIArchiveJobDistributor::set_archive | ( | archive::ArchiveBaseOP | archive | ) |
References archive_rank(), protocols::jd2::MPIFileBufJobDistributor::rank(), and theArchive_.
Referenced by protocols::abinitio::Broker_main().
|
private |
sync batch queue with slave node
sync batches with worker nodes.. this is called if they get a job for a batch they don't know yet... this method will send ALL batches they don't have yet.
References protocols::jd2::archive::ADD_BATCH, basic::ARCHIVE_SYNC_BATCHES, protocols::jd2::JobDistributor::batch(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), protocols::jd2::MPI_JOB_DIST_TAG(), MPI_ONLY, protocols::jd2::JobDistributor::nr_batches(), PROF_START, PROF_STOP, protocols::jd2::MPIFileBufJobDistributor::rank(), receive_batch(), runtime_assert, size(), and protocols::jd2::tr.
Referenced by batch_underflow(), and process_message().
|
friend |
|
private |
Referenced by archive_rank().
|
private |
Referenced by load_new_batch(), mark_job_as_bad(), and notify_archive().
|
private |
Referenced by load_new_batch(), and notify_archive().
|
private |
Referenced by load_new_batch(), mark_job_as_bad(), mark_job_as_completed(), and notify_archive().
|
private |
Referenced by load_new_batch(), mark_job_as_completed(), and notify_archive().
|
private |
Referenced by MPIArchiveJobDistributor(), and notify_archive().
|
private |
Referenced by load_new_batch(), and mark_job_as_bad().
|
private |
unsent notifications
Referenced by _notify_archive(), and notify_archive().
|
private |
Referenced by go(), and set_archive().