![]() |
Rosetta Protocols
2014.35
|
#include <MPIWorkPoolJobDistributor.hh>
Public Member Functions | |
virtual | ~MPIWorkPoolJobDistributor () |
dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt More... | |
virtual void | go (protocols::moves::MoverOP mover) |
dummy for master/slave version More... | |
virtual core::Size | get_new_job_id () |
dummy for master/slave version More... | |
virtual void | mark_current_job_id_for_repetition () |
dummy for master/slave version More... | |
virtual void | remove_bad_inputs_from_job_list () |
dummy for master/slave version More... | |
virtual void | job_succeeded (core::pose::Pose &pose, core::Real run_time, std::string const &tag) |
dummy for master/slave version More... | |
virtual void | mpi_finalize (bool finalize) |
should the go() function call MPI_finalize()? It probably should, this is true by default. More... | |
![]() | |
virtual | ~JobDistributor () |
void | go (protocols::moves::MoverOP mover, JobOutputterOP jo) |
invokes go, after setting JobOutputter More... | |
JobOP | current_job () const |
Movers may ask their controlling job distributor for information about the current job. They may also load information into this job for later output. More... | |
std::string | current_output_name () const |
Movers may ask their controlling job distributor for the output name as defined by the Job and JobOutputter. More... | |
JobOutputterOP | job_outputter () const |
Movers (or derived classes) may ask for the JobOutputter. More... | |
void | set_job_outputter (const JobOutputterOP &new_job_outputter) |
Movers (or derived classes) may ask for the JobOutputter. More... | |
JobInputterOP | job_inputter () const |
JobInputter access. More... | |
JobInputterInputSource::Enum | job_inputter_input_source () const |
The input source for the current JobInputter. More... | |
virtual void | restart () |
core::Size | total_nr_jobs () const |
core::Size | current_job_id () const |
integer access - which job are we on? More... | |
std::string | get_current_batch () const |
what is the current batch ? — name refers to the flag-file used for this batch More... | |
virtual void | add_batch (std::string const &, core::Size id=0) |
add a new batch ( name will be interpreted as flag_file ) More... | |
core::Size | current_batch_id () const |
what is the current batch number ? — refers to position in batches_ More... | |
Protected Member Functions | |
MPIWorkPoolJobDistributor () | |
ctor is protected; singleton pattern More... | |
virtual void | handle_interrupt () |
This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn. More... | |
virtual void | master_go (protocols::moves::MoverOP mover) |
Handles the receiving of job requests and the sending of job ids to and from slaves. More... | |
virtual void | slave_go (protocols::moves::MoverOP mover) |
Proceeds to the parent class go_main() as usual. More... | |
virtual core::Size | master_get_new_job_id () |
Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags. More... | |
virtual core::Size | slave_get_new_job_id () |
requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true More... | |
virtual void | master_mark_current_job_id_for_repetition () |
This should never be called as this is handled internally by the slave nodes, it utility_exits. More... | |
virtual void | slave_mark_current_job_id_for_repetition () |
Sets the repeat_job_ flag to true. More... | |
virtual void | master_remove_bad_inputs_from_job_list () |
Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input. More... | |
virtual void | slave_remove_bad_inputs_from_job_list () |
Sends a message to the head node that contains the id of a job that had bad input. More... | |
virtual void | master_job_succeeded (core::pose::Pose &pose, std::string const &tag) |
This should never be called as this is handled internally by the slave nodes, it utility_exits. More... | |
virtual void | slave_job_succeeded (core::pose::Pose &pose, std::string const &tag) |
Sends a message to the head node upon successful job completion to avoid output interleaving. More... | |
![]() | |
JobDistributor () | |
Singleton instantiation pattern; Derived classes will call default ctor, but their ctors, too must be protected (and the JDFactory must be their friend.) More... | |
JobDistributor (bool empty) | |
MPIArchiveJobDistributor starts with an empty job-list... More... | |
void | go_main (protocols::moves::MoverOP mover) |
Non-virtual get-job, run it, & output loop. This function is pretty generic and your subclass may be able to use it. It is NOT virtual - this implementation can be shared by (at least) the simple FileSystemJobDistributor, the MPIWorkPoolJobDistributor, and the MPIWorkPartitionJobDistributor. Do not feel that you need to use it as-is in your class - but DO plan on implementing all its functionality! More... | |
Jobs const & | get_jobs () const |
Read access to private data for derived classes. More... | |
void | mark_job_as_completed (core::Size job_id, core::Real run_time) |
Jobs is the container of Job objects need non-const to mark Jobs as completed on Master in MPI-JobDistributor. More... | |
void | mark_job_as_bad (core::Size job_id) |
ParserOP | parser () const |
Parser access. More... | |
void | begin_critical_section () |
void | end_critical_section () |
bool | obtain_new_job (bool re_consider_current_job=false) |
this function updates the current_job_id_ and current_job_ fields. The boolean return states whether or not a new job was obtained (if false, quit distributing!) More... | |
virtual void | job_succeeded_additional_output (core::pose::Pose &pose, std::string const &tag) |
This function is called upon a successful job completion if there are additional poses generated by the mover base implementation is just a call to the job outputter. More... | |
virtual void | job_failed (core::pose::Pose &, bool) |
This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter. More... | |
virtual void | current_job_finished () |
Derived classes are allowed to clean up any temporary files or data relating to the current job after the current job has completed. Called inside go_main loop. Default implementation is a no-op. More... | |
virtual void | note_all_jobs_finished () |
Derived classes are allowed to perform some kind of action when the job distributor runs out of jobs to execute. Called inside go_main. Default implementation is a no-op. More... | |
void | clear_current_job_output () |
void | set_batch_id (core::Size setting) |
set current_batch_id — eg for slave nodes in MPI framework More... | |
virtual bool | next_batch () |
switch current_batch_id_ to next batch More... | |
virtual void | batch_underflow () |
if end of batches_ reached via next_batch or set_batch_id ... More... | |
virtual void | load_new_batch () |
called by next_batch() or set_batch_id() to switch-over and restart JobDistributor on new batch More... | |
core::Size | nr_batches () const |
how many batches are in our list ... this can change dynamically More... | |
std::string const & | batch (core::Size batch_id) |
give name of batch with given id More... | |
Protected Attributes | |
core::Size | npes_ |
total number of processing elements More... | |
core::Size | rank_ |
rank of the "local" instance More... | |
core::Size | current_job_id_ |
where slave jobs store current job id More... | |
core::Size | next_job_to_assign_ |
where master stores next job to assign (in a good state after get_new_job_id up until it's used) More... | |
core::Size | bad_job_id_ |
where master temporarily stores id of jobs with bad input More... | |
bool | repeat_job_ |
where slave stores whether it should repeat its current job id More... | |
bool | finalize_MPI_ |
should the go() function call MPI_finalize? There are very few cases where this should be false More... | |
Friends | |
class | JobDistributorFactory |
Additional Inherited Members | |
![]() | |
static JobDistributor * | get_instance () |
static function to get the instance of ( pointer to) this singleton class More... | |
![]() | |
static void | setup_system_signal_handler (void(*prev_fn)(int)=jd2_signal_handler) |
Setting up callback function that will be call when our process is about to terminate. More... | |
static void | remove_system_signal_handler () |
Set signal handler back to default state. More... | |
static void | jd2_signal_handler (int Signal) |
Default callback function for signal handling. More... | |
This job distributor is meant for running jobs where the machine you are using has a large number of processors, the number of jobs is much greater than the number of processors, or the runtimes of the individual jobs could vary greatly. It dedicates the head node (whichever processor gets processor rank #0) to handling job requests from the slave nodes (all nonzero ranks). Unlike the MPIWorkPartitionJobDistributor, this JD will not work at all without MPI and the implementations of all but the interface functions have been put inside of ifdef directives. Generally each function has a master and slave version, and the interface functions call one or the other depending on processor rank.
|
protected |
ctor is protected; singleton pattern
constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.
References npes_, rank_, and utility_exit_with_message.
|
virtual |
dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt
WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt
|
virtual |
dummy for master/slave version
Implements protocols::jd2::JobDistributor.
References master_get_new_job_id(), rank_, and slave_get_new_job_id().
|
virtual |
dummy for master/slave version
Reimplemented from protocols::jd2::JobDistributor.
References finalize_MPI_, master_go(), rank_, and slave_go().
|
inlineprotectedvirtual |
This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn.
Implements protocols::jd2::JobDistributor.
|
virtual |
dummy for master/slave version
Reimplemented from protocols::jd2::JobDistributor.
References master_job_succeeded(), rank_, and slave_job_succeeded().
|
virtual |
dummy for master/slave version
Implements protocols::jd2::JobDistributor.
References protocols::jd2::JobDistributor::clear_current_job_output(), master_mark_current_job_id_for_repetition(), rank_, and slave_mark_current_job_id_for_repetition().
|
protectedvirtual |
Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags.
References protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::job_outputter(), next_job_to_assign_, option, out::overwrite, and protocols::jd2::TR.
Referenced by get_new_job_id(), protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), and master_remove_bad_inputs_from_job_list().
|
protectedvirtual |
Handles the receiving of job requests and the sending of job ids to and from slaves.
This is the heart of the MPIWorkPoolJobDistributor. It consists of two while loops: the job distribution loop (JDL) and the node spin down loop (NSDL). The JDL has three functions. The first is to receive and process messages from the slave nodes requesting new job ids. The second is to receive and process messages from the slave nodes indicating a bad input. The third is to receive and process job_success messages from the slave nodes and block while the slave node is writing its output. This is prevent interleaving of output in score files and silent files. The function of the NSDL is to keep the head node alive while there are still slave nodes processing. Without the NSDL if a slave node finished its allocated job after the head node had finished handing out all of the jobs and exiting (a very likely scenario), it would wait indefinitely for a response from the head node when requesting a new job id.
Reimplemented in protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor.
References protocols::jd2::BAD_INPUT_TAG, bad_job_id_, protocols::jd2::JOB_SUCCESS_TAG, listener_tag_to_name(), master_get_new_job_id(), master_remove_bad_inputs_from_job_list(), MPI_ANY_SOURCE, protocols::jd2::NEW_JOB_ID_TAG, next_job_to_assign_, npes_, rank_, utility::receive_string_from_node(), protocols::jd2::REQUEST_MESSAGE_TAG, runtime_assert, utility::send_string_to_node(), protocols::jd2::TR, and utility_exit_with_message.
Referenced by go().
|
protectedvirtual |
This should never be called as this is handled internally by the slave nodes, it utility_exits.
References rank_, runtime_assert, protocols::jd2::TR, and utility_exit_with_message.
Referenced by job_succeeded().
|
protectedvirtual |
This should never be called as this is handled internally by the slave nodes, it utility_exits.
References rank_, runtime_assert, protocols::jd2::TR, and utility_exit_with_message.
Referenced by mark_current_job_id_for_repetition().
|
protectedvirtual |
Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input.
References bad_job_id_, protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::job_outputter(), master_get_new_job_id(), next_job_to_assign_, rank_, runtime_assert, and protocols::jd2::TR.
Referenced by protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), and remove_bad_inputs_from_job_list().
|
virtual |
should the go() function call MPI_finalize()? It probably should, this is true by default.
Reimplemented from protocols::jd2::JobDistributor.
References finalize_MPI_.
|
virtual |
dummy for master/slave version
Reimplemented from protocols::jd2::JobDistributor.
References master_remove_bad_inputs_from_job_list(), rank_, and slave_remove_bad_inputs_from_job_list().
|
protectedvirtual |
requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true
References current_job_id_, protocols::jd2::NEW_JOB_ID_TAG, rank_, repeat_job_, runtime_assert, and protocols::jd2::TR.
Referenced by get_new_job_id().
|
protectedvirtual |
Proceeds to the parent class go_main() as usual.
References protocols::jd2::JobDistributor::go_main(), rank_, and runtime_assert.
Referenced by go().
|
protectedvirtual |
Sends a message to the head node upon successful job completion to avoid output interleaving.
References protocols::jd2::JobDistributor::current_job(), current_job_id_, protocols::jd2::JobDistributor::job_outputter(), protocols::jd2::JOB_SUCCESS_TAG, MPI_ONLY, option, rank_, runtime_assert, tag, and protocols::jd2::TR.
Referenced by job_succeeded().
|
protectedvirtual |
Sets the repeat_job_ flag to true.
References current_job_id_, rank_, repeat_job_, runtime_assert, and protocols::jd2::TR.
Referenced by mark_current_job_id_for_repetition().
|
protectedvirtual |
Sends a message to the head node that contains the id of a job that had bad input.
References protocols::jd2::BAD_INPUT_TAG, current_job_id_, rank_, and runtime_assert.
Referenced by remove_bad_inputs_from_job_list().
|
friend |
|
protected |
where master temporarily stores id of jobs with bad input
Referenced by protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), and master_remove_bad_inputs_from_job_list().
|
protected |
where slave jobs store current job id
Referenced by protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::slave_add_unfolded_energy_data(), slave_get_new_job_id(), slave_job_succeeded(), slave_mark_current_job_id_for_repetition(), slave_remove_bad_inputs_from_job_list(), and protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::slave_set_energy_terms().
|
protected |
should the go() function call MPI_finalize? There are very few cases where this should be false
Referenced by go(), and mpi_finalize().
|
protected |
where master stores next job to assign (in a good state after get_new_job_id up until it's used)
Referenced by master_get_new_job_id(), protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), and master_remove_bad_inputs_from_job_list().
|
protected |
total number of processing elements
Referenced by protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), and MPIWorkPoolJobDistributor().
|
protected |
rank of the "local" instance
Referenced by protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::add_unfolded_energy_data(), get_new_job_id(), go(), job_succeeded(), mark_current_job_id_for_repetition(), protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), master_job_succeeded(), master_mark_current_job_id_for_repetition(), master_remove_bad_inputs_from_job_list(), MPIWorkPoolJobDistributor(), remove_bad_inputs_from_job_list(), protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::set_energy_terms(), protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::slave_add_unfolded_energy_data(), slave_get_new_job_id(), slave_go(), slave_job_succeeded(), slave_mark_current_job_id_for_repetition(), slave_remove_bad_inputs_from_job_list(), and protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::slave_set_energy_terms().
|
protected |
where slave stores whether it should repeat its current job id
Referenced by slave_get_new_job_id(), and slave_mark_current_job_id_for_repetition().