Rosetta  2020.37
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Public Member Functions | Protected Member Functions | Protected Attributes | Friends | List of all members
protocols::jd2::MPIWorkPoolJobDistributor Class Reference

#include <MPIWorkPoolJobDistributor.hh>

Inheritance diagram for protocols::jd2::MPIWorkPoolJobDistributor:
Inheritance graph
[legend]

Public Member Functions

 ~MPIWorkPoolJobDistributor () override
 dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt More...
 
void go (protocols::moves::MoverOP mover) override
 dummy for master/slave version More...
 
core::Size get_new_job_id () override
 dummy for master/slave version More...
 
void mark_current_job_id_for_repetition () override
 dummy for master/slave version More...
 
void remove_bad_inputs_from_job_list () override
 dummy for master/slave version More...
 
void job_succeeded (core::pose::Pose &pose, core::Real run_time, std::string const &tag) override
 dummy for master/slave version More...
 
void job_failed (core::pose::Pose &pose, bool will_retry) override
 Called if job fails. More...
 
void mpi_finalize (bool finalize) override
 should the go() function call MPI_finalize()? It probably should, this is true by default. More...
 
- Public Member Functions inherited from protocols::jd2::JobDistributor
virtual ~JobDistributor ()
 
void go (protocols::moves::MoverOP mover, JobOutputterOP jo)
 invokes go, after setting JobOutputter More...
 
void go (protocols::moves::MoverOP mover, JobInputterOP ji)
 invokes go, after setting JobInputter More...
 
void go (protocols::moves::MoverOP mover, JobInputterOP ji, JobOutputterOP jo)
 invokes go, after setting JobInputter and JobOutputter More...
 
virtual JobOP current_job () const
 Movers may ask their controlling job distributor for information about the current job. They may also write information to this job for later output, though this use is now discouraged as the addition of the MultiplePoseMover now means that a single job may include several separate trajectories. More...
 
virtual std::string current_output_name () const
 Movers may ask their controlling job distributor for the output name as defined by the Job and JobOutputter. More...
 
JobOutputterOP job_outputter () const
 Movers (or derived classes) may ask for the JobOutputter. More...
 
void set_job_outputter (const JobOutputterOP &new_job_outputter)
 Movers (or derived classes) may ask for the JobOutputter. More...
 
JobInputterOP job_inputter () const
 JobInputter access. More...
 
void set_job_inputter (JobInputterOP new_job_inputter)
 Set the JobInputter and reset the Job list – this is not something you want to do after go() has been called, but before it has returned. More...
 
JobInputterInputSource::Enum job_inputter_input_source () const
 The input source for the current JobInputter. More...
 
virtual void restart ()
 
core::Size total_nr_jobs () const
 
core::Size current_job_id () const
 integer access - which job are we on? More...
 
std::string get_current_batch () const
 what is the current batch ? — name refers to the flag-file used for this batch More...
 
virtual void add_batch (std::string const &, core::Size id=0)
 add a new batch ( name will be interpreted as flag_file ) More...
 
core::Size current_batch_id () const
 what is the current batch number ? — refers to position in batches_ More...
 

Protected Member Functions

 MPIWorkPoolJobDistributor ()
 ctor is protected; singleton pattern More...
 
void handle_interrupt () override
 This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawns. More...
 
virtual void master_go (protocols::moves::MoverOP mover)
 Handles the receiving of job requests and the sending of job ids to and from slaves. More...
 
virtual void slave_go (protocols::moves::MoverOP mover)
 Proceeds to the parent class go_main() as usual. More...
 
virtual core::Size master_get_new_job_id ()
 Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags. More...
 
virtual core::Size slave_get_new_job_id ()
 requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true More...
 
virtual void master_mark_current_job_id_for_repetition ()
 This should never be called as this is handled internally by the slave nodes, it utility_exits. More...
 
virtual void slave_mark_current_job_id_for_repetition ()
 Sets the repeat_job_ flag to true. More...
 
virtual void master_remove_bad_inputs_from_job_list ()
 Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input. More...
 
virtual void slave_remove_bad_inputs_from_job_list ()
 Sends a message to the head node that contains the id of a job that had bad input. More...
 
virtual void master_job_succeeded (core::pose::Pose &pose, std::string const &tag)
 This should never be called as this is handled internally by the slave nodes, it utility_exits. More...
 
virtual void slave_job_succeeded (core::pose::Pose &pose, std::string const &tag)
 Sends a message to the head node upon successful job completion to avoid output interleaving. More...
 
virtual void master_mark_job_as_completed (core::Size const job_index)
 Mark the job as completed/deletable in the jobs list on the master process. More...
 
virtual void master_mark_job_as_failed (core::Size const job_index)
 Mark the job as deletable in the jobs list on the master process. More...
 
virtual void set_sequential_distribution (bool const val)
 Set whether the JobDistributor sends jobs to each slave in sequence (1, 2, 3, etc.) More...
 
virtual bool sequential_distribution () const
 Get whether the JobDistributor sends jobs to each slave in sequence (1, 2, 3, etc.)? More...
 
virtual void set_starter_for_sequential_distribution (bool const val)
 Set whether this is the process that should start requesting jobs (be the first for sequential distribution). More...
 
virtual bool starter_for_sequential_distribution () const
 Is this the process that should start requesting jobs (be the first for sequential distribution)? More...
 
virtual void wait_for_go_signal () const
 Wait for a signal from the n-1 process saying that I can proceed. More...
 
virtual void send_go_signal ()
 Send a signal to the n+1 process saying that it can proceed. More...
 
- Protected Member Functions inherited from protocols::jd2::JobDistributor
 JobDistributor ()
 Singleton instantiation pattern; Derived classes will call default ctor, but their ctors, too must be protected (and the JDFactory must be their friend.) More...
 
 JobDistributor (bool empty)
 MPIArchiveJobDistributor starts with an empty job-list... More...
 
void go_main (protocols::moves::MoverOP mover)
 Non-virtual get-job, run it, & output loop. This function is pretty generic and your subclass may be able to use it. It is NOT virtual - this implementation can be shared by (at least) the simple FileSystemJobDistributor, the MPIWorkPoolJobDistributor, and the MPIWorkPartitionJobDistributor. Do not feel that you need to use it as-is in your class - but DO plan on implementing all its functionality! More...
 
JobsContainer const & get_jobs () const
 Read access to private data for derived classes. More...
 
JobsContainerget_jobs_nonconst ()
 Jobs is the container of Job objects. More...
 
void mark_job_as_completed (core::Size job_id, core::Real run_time)
 Jobs is the container of Job objects need non-const to mark Jobs as completed on Master in MPI-JobDistributor. More...
 
void mark_job_as_bad (core::Size job_id)
 
RosettaScriptsParserOP parser () const
 Parser access. More...
 
void begin_critical_section ()
 
void end_critical_section ()
 
void set_current_job_by_index (core::Size curr_job_index)
 For derived classes that wish to invoke JobDistributor functions which use the current_job_ and current_job_id_ member variables. Note that until those functions complete, it would be a bad idea for another thread to change current_job_. More...
 
bool obtain_new_job (bool re_consider_current_job=false)
 this function updates the current_job_id_ and current_job_ fields. The boolean return states whether or not a new job was obtained (if false, quit distributing!) More...
 
virtual void job_succeeded_additional_output (core::pose::Pose &pose, std::string const &tag)
 This function is called upon a successful job completion if there are additional poses generated by the mover base implementation is just a call to the job outputter. More...
 
virtual void current_job_finished ()
 Derived classes are allowed to clean up any temporary files or data relating to the current job after the current job has completed. Called inside go_main loop. Default implementation is a no-op. More...
 
virtual void note_all_jobs_finished ()
 Derived classes are allowed to perform some kind of action when the job distributor runs out of jobs to execute. Called inside go_main. Default implementation is a no-op. More...
 
void clear_current_job_output ()
 
void check_for_parser_in_go_main ()
 Send a message to the screen indicating that the parser is in use and that the mover that's been input to go_main will not be used, but instead will be replaced by the Mover created by the parser. More...
 
bool using_parser () const
 Is the parser in use? More...
 
bool run_one_job (protocols::moves::MoverOP &mover, time_t allstarttime, std::string &last_inner_job_tag, std::string &last_output_tag, core::Size &last_batch_id, core::Size &retries_this_job, bool first_job)
 
void setup_pymol_observer (core::pose::Pose &pose)
 After the construction of the pose for this job, check the command line to determine if the pymol observer should be attached to it. More...
 
void write_output_from_job (core::pose::Pose &pose, protocols::moves::MoverOP mover_copy, protocols::moves::MoverStatus status, core::Size jobtime, core::Size &retries_this_job)
 After a job has finished running, figure out from the MoverStatus whether the pose should be written to disk (or wherever) along with any other poses that the mover might have generated along the way. More...
 
void increment_failed_jobs ()
 Increment the number of failed jobs. More...
 
core::Size get_job_time_estimate () const
 Get an estimate of the time to run an additional job. If it can't be estimated, return a time of zero. More...
 
void write_citations_and_clear_citation_tracking () const
 Write out information about all modules that have been used that should be cited, then clear the list of citations from the CitationManager. More...
 
void set_batch_id (core::Size setting)
 set current_batch_id — eg for slave nodes in MPI framework More...
 
virtual bool next_batch ()
 switch current_batch_id_ to next batch More...
 
virtual void batch_underflow ()
 if end of batches_ reached via next_batch or set_batch_id ... More...
 
virtual void load_new_batch ()
 called by next_batch() or set_batch_id() to switch-over and restart JobDistributor on new batch More...
 
core::Size nr_batches () const
 how many batches are in our list ... this can change dynamically More...
 
std::string const & batch (core::Size batch_id)
 give name of batch with given id More...
 

Protected Attributes

core::Size npes_
 total number of processing elements More...
 
core::Size rank_
 rank of the "local" instance More...
 
core::Size current_job_id_
 where slave jobs store current job id More...
 
core::Size next_job_to_assign_
 where master stores next job to assign (in a good state after get_new_job_id up until it's used) More...
 
core::Size bad_job_id_
 where master temporarily stores id of jobs with bad input More...
 
bool repeat_job_
 where slave stores whether it should repeat its current job id More...
 
bool finalize_MPI_
 should the go() function call MPI_finalize? There are very few cases where this should be false More...
 
bool sequential_distribution_
 Should the JobDistributor send jobs to each slave in sequence (1, 2, 3, etc.)? Default false – slaves request jobs as they become available. More...
 
bool starter_for_sequential_distribution_
 Is this the process that should start requesting jobs (be the first for sequential distribution)? More...
 

Friends

class JobDistributorFactory
 

Additional Inherited Members

- Static Public Member Functions inherited from protocols::jd2::JobDistributor
static bool has_been_instantiated ()
 Has the job distributor been instantiated? More...
 
static JobDistributorget_instance ()
 static function to get the instance of ( pointer to) this singleton class More...
 
- Static Protected Member Functions inherited from protocols::jd2::JobDistributor
static void setup_system_signal_handler (void(*prev_fn)(int)=jd2_signal_handler)
 Setting up callback function that will be call when our process is about to terminate. This will allow us to exit propely (clean up in_progress_files/tmp files if any). More...
 
static void remove_system_signal_handler ()
 Set signal handler back to default state. More...
 
static void jd2_signal_handler (int Signal)
 Default callback function for signal handling. More...
 

Detailed Description

This job distributor is meant for running jobs where the machine you are using has a large number of processors, the number of jobs is much greater than the number of processors, or the runtimes of the individual jobs could vary greatly. It dedicates the head node (whichever processor gets processor rank #0) to handling job requests from the slave nodes (all nonzero ranks). Unlike the MPIWorkPartitionJobDistributor, this JD will not work at all without MPI and the implementations of all but the interface functions have been put inside of ifdef directives. Generally each function has a master and slave version, and the interface functions call one or the other depending on processor rank.

Constructor & Destructor Documentation

protocols::jd2::MPIWorkPoolJobDistributor::MPIWorkPoolJobDistributor ( )
protected

ctor is protected; singleton pattern

constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.

References protocols::jd2::JobDistributor::get_jobs_nonconst(), npes_, rank_, and protocols::jd2::JobsContainer::set_force_job_purging().

protocols::jd2::MPIWorkPoolJobDistributor::~MPIWorkPoolJobDistributor ( )
overridedefault

dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt

WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt

Member Function Documentation

core::Size protocols::jd2::MPIWorkPoolJobDistributor::get_new_job_id ( )
overridevirtual

dummy for master/slave version

Implements protocols::jd2::JobDistributor.

References master_get_new_job_id(), rank_, and slave_get_new_job_id().

void protocols::jd2::MPIWorkPoolJobDistributor::go ( protocols::moves::MoverOP  mover)
overridevirtual

dummy for master/slave version

Reimplemented from protocols::jd2::JobDistributor.

References finalize_MPI_, master_go(), rank_, set_sequential_distribution(), and slave_go().

void protocols::jd2::MPIWorkPoolJobDistributor::handle_interrupt ( )
inlineoverrideprotectedvirtual

This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawns.

Implements protocols::jd2::JobDistributor.

void protocols::jd2::MPIWorkPoolJobDistributor::job_failed ( core::pose::Pose pose,
bool  will_retry 
)
overridevirtual

Called if job fails.

Called if the job failed.

Reimplemented from protocols::jd2::JobDistributor.

References current_job_id_, protocols::jd2::JobDistributor::increment_failed_jobs(), rank_, and protocols::jd2::TR().

void protocols::jd2::MPIWorkPoolJobDistributor::job_succeeded ( core::pose::Pose pose,
core::Real  run_time,
std::string const &  tag 
)
overridevirtual

dummy for master/slave version

Reimplemented from protocols::jd2::JobDistributor.

References master_job_succeeded(), rank_, and slave_job_succeeded().

void protocols::jd2::MPIWorkPoolJobDistributor::mark_current_job_id_for_repetition ( )
overridevirtual
core::Size protocols::jd2::MPIWorkPoolJobDistributor::master_get_new_job_id ( )
protectedvirtual
void protocols::jd2::MPIWorkPoolJobDistributor::master_go ( protocols::moves::MoverOP  mover)
protectedvirtual

Handles the receiving of job requests and the sending of job ids to and from slaves.

This is the heart of the MPIWorkPoolJobDistributor. It consists of two while loops: the job distribution loop (JDL) and the node spin down loop (NSDL). The JDL has three functions. The first is to receive and process messages from the slave nodes requesting new job ids. The second is to receive and process messages from the slave nodes indicating a bad input. The third is to receive and process job_success messages from the slave nodes and block while the slave node is writing its output. This is prevent interleaving of output in score files and silent files. The function of the NSDL is to keep the head node alive while there are still slave nodes processing. Without the NSDL if a slave node finished its allocated job after the head node had finished handing out all of the jobs and exiting (a very likely scenario), it would wait indefinitely for a response from the head node when requesting a new job id.

Reimplemented in protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor.

References bad_job_id_, master_get_new_job_id(), master_mark_job_as_completed(), master_mark_job_as_failed(), master_remove_bad_inputs_from_job_list(), MPI_ANY_SOURCE, next_job_to_assign_, npes_, rank_, and protocols::jd2::TR().

Referenced by go().

void protocols::jd2::MPIWorkPoolJobDistributor::master_job_succeeded ( core::pose::Pose pose,
std::string const &  tag 
)
protectedvirtual

This should never be called as this is handled internally by the slave nodes, it utility_exits.

References rank_, and protocols::jd2::TR().

Referenced by job_succeeded().

void protocols::jd2::MPIWorkPoolJobDistributor::master_mark_current_job_id_for_repetition ( )
protectedvirtual

This should never be called as this is handled internally by the slave nodes, it utility_exits.

References rank_, and protocols::jd2::TR().

Referenced by mark_current_job_id_for_repetition().

void protocols::jd2::MPIWorkPoolJobDistributor::master_mark_job_as_completed ( core::Size const  job_index)
protectedvirtual

Mark the job as completed/deletable in the jobs list on the master process.

References protocols::jd2::JobDistributor::get_jobs_nonconst(), and protocols::jd2::TR().

Referenced by master_go().

void protocols::jd2::MPIWorkPoolJobDistributor::master_mark_job_as_failed ( core::Size const  job_index)
protectedvirtual

Mark the job as deletable in the jobs list on the master process.

References protocols::jd2::JobDistributor::get_jobs_nonconst(), and protocols::jd2::TR().

Referenced by master_go().

void protocols::jd2::MPIWorkPoolJobDistributor::master_remove_bad_inputs_from_job_list ( )
protectedvirtual
void protocols::jd2::MPIWorkPoolJobDistributor::mpi_finalize ( bool  finalize)
overridevirtual

should the go() function call MPI_finalize()? It probably should, this is true by default.

Reimplemented from protocols::jd2::JobDistributor.

References finalize_MPI_.

void protocols::jd2::MPIWorkPoolJobDistributor::remove_bad_inputs_from_job_list ( )
overridevirtual
void protocols::jd2::MPIWorkPoolJobDistributor::send_go_signal ( )
protectedvirtual

Send a signal to the n+1 process saying that it can proceed.

This also sets starter_for_sequential_distribution_ to false, since we no longer want this process to refrain from waiting.

References npes_, rank_, and set_starter_for_sequential_distribution().

Referenced by slave_get_new_job_id().

virtual bool protocols::jd2::MPIWorkPoolJobDistributor::sequential_distribution ( ) const
inlineprotectedvirtual

Get whether the JobDistributor sends jobs to each slave in sequence (1, 2, 3, etc.)?

References sequential_distribution_.

Referenced by slave_get_new_job_id(), and slave_go().

virtual void protocols::jd2::MPIWorkPoolJobDistributor::set_sequential_distribution ( bool const  val)
inlineprotectedvirtual

Set whether the JobDistributor sends jobs to each slave in sequence (1, 2, 3, etc.)

References sequential_distribution_, and protocols::hybridization::val.

Referenced by go().

virtual void protocols::jd2::MPIWorkPoolJobDistributor::set_starter_for_sequential_distribution ( bool const  val)
inlineprotectedvirtual

Set whether this is the process that should start requesting jobs (be the first for sequential distribution).

References starter_for_sequential_distribution_, and protocols::hybridization::val.

Referenced by send_go_signal(), and slave_go().

core::Size protocols::jd2::MPIWorkPoolJobDistributor::slave_get_new_job_id ( )
protectedvirtual
void protocols::jd2::MPIWorkPoolJobDistributor::slave_go ( protocols::moves::MoverOP  mover)
protectedvirtual
void protocols::jd2::MPIWorkPoolJobDistributor::slave_job_succeeded ( core::pose::Pose pose,
std::string const &  tag 
)
protectedvirtual

Sends a message to the head node upon successful job completion to avoid output interleaving.

References protocols::jd2::JobDistributor::current_job(), current_job_id_, protocols::jd2::JobDistributor::job_outputter(), rank_, and protocols::jd2::TR().

Referenced by job_succeeded().

void protocols::jd2::MPIWorkPoolJobDistributor::slave_mark_current_job_id_for_repetition ( )
protectedvirtual

Sets the repeat_job_ flag to true.

References current_job_id_, rank_, repeat_job_, and protocols::jd2::TR().

Referenced by mark_current_job_id_for_repetition().

void protocols::jd2::MPIWorkPoolJobDistributor::slave_remove_bad_inputs_from_job_list ( )
protectedvirtual

Sends a message to the head node that contains the id of a job that had bad input.

References current_job_id_, and rank_.

Referenced by remove_bad_inputs_from_job_list().

virtual bool protocols::jd2::MPIWorkPoolJobDistributor::starter_for_sequential_distribution ( ) const
inlineprotectedvirtual

Is this the process that should start requesting jobs (be the first for sequential distribution)?

References starter_for_sequential_distribution_.

Referenced by slave_get_new_job_id().

void protocols::jd2::MPIWorkPoolJobDistributor::wait_for_go_signal ( ) const
protectedvirtual

Wait for a signal from the n-1 process saying that I can proceed.

References npes_, and rank_.

Referenced by slave_get_new_job_id().

Friends And Related Function Documentation

friend class JobDistributorFactory
friend

Member Data Documentation

core::Size protocols::jd2::MPIWorkPoolJobDistributor::bad_job_id_
protected
core::Size protocols::jd2::MPIWorkPoolJobDistributor::current_job_id_
protected
bool protocols::jd2::MPIWorkPoolJobDistributor::finalize_MPI_
protected

should the go() function call MPI_finalize? There are very few cases where this should be false

Referenced by go(), and mpi_finalize().

core::Size protocols::jd2::MPIWorkPoolJobDistributor::next_job_to_assign_
protected

where master stores next job to assign (in a good state after get_new_job_id up until it's used)

Referenced by master_get_new_job_id(), protocols::unfolded_state_energy_calculator::UnfoldedStateEnergyCalculatorMPIWorkPoolJobDistributor::master_go(), master_go(), and master_remove_bad_inputs_from_job_list().

core::Size protocols::jd2::MPIWorkPoolJobDistributor::npes_
protected
core::Size protocols::jd2::MPIWorkPoolJobDistributor::rank_
protected
bool protocols::jd2::MPIWorkPoolJobDistributor::repeat_job_
protected

where slave stores whether it should repeat its current job id

Referenced by slave_get_new_job_id(), and slave_mark_current_job_id_for_repetition().

bool protocols::jd2::MPIWorkPoolJobDistributor::sequential_distribution_
protected

Should the JobDistributor send jobs to each slave in sequence (1, 2, 3, etc.)? Default false – slaves request jobs as they become available.

Referenced by sequential_distribution(), and set_sequential_distribution().

bool protocols::jd2::MPIWorkPoolJobDistributor::starter_for_sequential_distribution_
protected

Is this the process that should start requesting jobs (be the first for sequential distribution)?

Default false

Referenced by set_starter_for_sequential_distribution(), and starter_for_sequential_distribution().


The documentation for this class was generated from the following files: