Rosetta 3.4
Public Member Functions | Protected Member Functions | Friends
protocols::jd2::MPIFileBufJobDistributor Class Reference

#include <MPIFileBufJobDistributor.hh>

Inheritance diagram for protocols::jd2::MPIFileBufJobDistributor:
Inheritance graph
[legend]
Collaboration diagram for protocols::jd2::MPIFileBufJobDistributor:
Collaboration graph
[legend]

List of all members.

Public Member Functions

virtual ~MPIFileBufJobDistributor ()
 dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt
core::Size increment_client_rank ()
core::Size min_client_rank () const
 return rank of first worker process (there might be more dedicated processes, e.g., ArchiveManager...)
virtual void go (protocols::moves::MoverOP mover)
 dummy for master/slave version
virtual core::Size get_new_job_id ()
 dummy for master/slave version
virtual void mark_current_job_id_for_repetition ()
 dummy for master/slave version
virtual void remove_bad_inputs_from_job_list ()
 dummy for master/slave version
virtual void job_succeeded (core::pose::Pose &pose, core::Real runtime)
 dummy for master/slave version
virtual void job_failed (core::pose::Pose &pose, bool will_retry)
 This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter.

Protected Member Functions

 MPIFileBufJobDistributor ()
 ctor is protected; singleton pattern
 MPIFileBufJobDistributor (core::Size master_rank, core::Size file_buf_rank, core::Size min_client_rank, bool start_empty=false)
 protected ctor for child-classes
virtual void handle_interrupt ()
 This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn.
virtual bool process_message (core::Size msg_tag, core::Size slave_rank, core::Size slave_job_id, core::Size slave_batch_id, core::Real runtime)
virtual bool next_batch ()
 switch current_batch_id_ to next batch
void master_go (protocols::moves::MoverOP mover)
 Handles the receiving of job requests and the sending of job ids to and from slaves.
core::Size master_get_new_job_id ()
 Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags.
core::Size slave_get_new_job_id ()
 requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true
void master_mark_current_job_id_for_repetition ()
 This should never be called as this is handled internally by the slave nodes, it utility_exits.
void slave_mark_current_job_id_for_repetition ()
 Sets the repeat_job_ flag to true.
void master_remove_bad_inputs_from_job_list ()
 Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input.
void slave_remove_bad_inputs_from_job_list ()
 Sends a message to the head node that contains the id of a job that had bad input.
void master_job_succeeded (core::pose::Pose &pose)
 This should never be called as this is handled internally by the slave nodes, it utility_exits.
void slave_job_succeeded (core::pose::Pose &pose)
 Sends a message to the head node upon successful job completion to avoid output interleaving.
void slave_to_master (core::Size tag)
 send a message to master
void send_job_to_slave (core::Size slave_rank)
 called by master to send and by slave to receive job
core::Size rank () const
 return rank of this process
core::Size master_rank () const
 return rank of master process ( where JobDistributor is running )
core::Size file_buf_rank () const
 return rank of file-buffer process ( where output data (via ozstream )is handled )
core::Size number_of_processors ()
 how many processes --- this includes dedicated processes
core::Size n_rank ()
 how many processes --- this includes dedicated processes
core::Size n_worker ()
 how many workers --- important to keep track during spin-down process
void set_n_worker (core::Size setting)
 how many workers --- important to keep track during spin-down process
virtual void mark_job_as_completed (core::Size job_id, core::Size batch_id, core::Real runtime)
 marks job as completed in joblist
virtual void mark_job_as_bad (core::Size job_id, core::Size batch_id)
 marks job as bad in joblist
void eat_signal (core::Size signal, int source)
 receive a certain signal and ignore it.... this is needed, for instance, when MPIArchiveJobDistributor triggers an ADD_BATCH signal by sending QUEUE_EMPTY to the ArchiveManager...

Friends

class JobDistributorFactory

Detailed Description

This JobDistributor is intended for machines where you have a large number of processors. two dedicated processes are used to handle JobDistribution and File-IO. all other processes (higher rank ) are used for computation. the file_buf_rank_ process runs the MpiFileBuffer which is at the receiving end of all ozstream output that is rerouted via MPI from the slave nodes. This means all slaves write to the same file without FileSystem congestion and interlacing in the file -- IO is handled from a single dedicated process The other dedicated process (master_rank) runs the actual JobDistributor and is only used to distribute jobs to slaves and receive their notification of successful or failed execution in case you have only a small number of processors you can put say 10 MPI processes on 8 processors to get optimal CPU usage.


Constructor & Destructor Documentation

protocols::jd2::MPIFileBufJobDistributor::MPIFileBufJobDistributor ( ) [protected]

ctor is protected; singleton pattern

constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.

protocols::jd2::MPIFileBufJobDistributor::MPIFileBufJobDistributor ( core::Size  master_rank,
core::Size  file_buf_rank,
core::Size  min_client_rank,
bool  start_empty = false 
) [protected]

protected ctor for child-classes

constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.

protocols::jd2::MPIFileBufJobDistributor::~MPIFileBufJobDistributor ( ) [virtual]

dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt

WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt


Member Function Documentation

void protocols::jd2::MPIFileBufJobDistributor::eat_signal ( core::Size  signal,
int  source 
) [protected]

receive a certain signal and ignore it.... this is needed, for instance, when MPIArchiveJobDistributor triggers an ADD_BATCH signal by sending QUEUE_EMPTY to the ArchiveManager...

receive message of certain type -- and ignore it ... sometimes needed in communication protocol

References protocols::jd2::MPI_JOB_DIST_TAG(), process_message(), and protocols::jd2::tr().

Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::batch_underflow().

core::Size protocols::jd2::MPIFileBufJobDistributor::file_buf_rank ( ) const [inline, protected]

return rank of file-buffer process ( where output data (via ozstream )is handled )

Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::go().

core::Size protocols::jd2::MPIFileBufJobDistributor::get_new_job_id ( ) [virtual]

dummy for master/slave version

Implements protocols::jd2::JobDistributor.

Reimplemented in protocols::jd2::MPIMultiCommJobDistributor.

References master_get_new_job_id(), and slave_get_new_job_id().

void protocols::jd2::MPIFileBufJobDistributor::go ( protocols::moves::MoverOP  mover) [virtual]
virtual void protocols::jd2::MPIFileBufJobDistributor::handle_interrupt ( ) [inline, protected, virtual]

This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn.

Implements protocols::jd2::JobDistributor.

Reimplemented in protocols::jd2::archive::MPIArchiveJobDistributor, and protocols::jd2::MPIMultiCommJobDistributor.

core::Size protocols::jd2::MPIFileBufJobDistributor::increment_client_rank ( ) [inline]
void protocols::jd2::MPIFileBufJobDistributor::job_failed ( core::pose::Pose ,
bool   
) [virtual]

This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter.

no-op implementation in the base class

Reimplemented from protocols::jd2::JobDistributor.

Reimplemented in protocols::jd2::MPIMultiCommJobDistributor.

References protocols::jd2::JOB_FAILED_NO_RETRY, and slave_to_master().

void protocols::jd2::MPIFileBufJobDistributor::job_succeeded ( core::pose::Pose pose,
core::Real  runtime 
) [virtual]

dummy for master/slave version

Reimplemented from protocols::jd2::JobDistributor.

Reimplemented in protocols::jd2::MPIMultiCommJobDistributor.

References master_job_succeeded(), and slave_job_succeeded().

void protocols::jd2::MPIFileBufJobDistributor::mark_current_job_id_for_repetition ( ) [virtual]
void protocols::jd2::MPIFileBufJobDistributor::mark_job_as_bad ( core::Size  job_id,
core::Size  batch_id 
) [protected, virtual]
void protocols::jd2::MPIFileBufJobDistributor::mark_job_as_completed ( core::Size  job_id,
core::Size  batch_id,
core::Real  runtime 
) [protected, virtual]
core::Size protocols::jd2::MPIFileBufJobDistributor::master_get_new_job_id ( ) [protected]

Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags.

work out what next job is

References protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::JobDistributor::current_job_id(), protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::job_outputter(), mark_job_as_completed(), and protocols::jd2::tr().

Referenced by get_new_job_id().

void protocols::jd2::MPIFileBufJobDistributor::master_go ( protocols::moves::MoverOP  mover) [protected]

Handles the receiving of job requests and the sending of job ids to and from slaves.

the main message loop --- master cycles thru until all slave nodes have been spun down

References protocols::jd2::JobDistributor::current_job_id(), MPI_ANY_SOURCE, protocols::jd2::MPI_JOB_DIST_TAG(), n_worker(), protocols::jd2::JobDistributor::obtain_new_job(), process_message(), and protocols::jd2::tr().

Referenced by go(), and protocols::jd2::archive::MPIArchiveJobDistributor::go().

void protocols::jd2::MPIFileBufJobDistributor::master_job_succeeded ( core::pose::Pose pose) [protected]

This should never be called as this is handled internally by the slave nodes, it utility_exits.

References protocols::jd2::tr().

Referenced by job_succeeded().

void protocols::jd2::MPIFileBufJobDistributor::master_mark_current_job_id_for_repetition ( ) [protected]

This should never be called as this is handled internally by the slave nodes, it utility_exits.

References protocols::jd2::tr().

Referenced by mark_current_job_id_for_repetition().

core::Size protocols::jd2::MPIFileBufJobDistributor::master_rank ( ) const [inline, protected]
void protocols::jd2::MPIFileBufJobDistributor::master_remove_bad_inputs_from_job_list ( ) [protected]

Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input.

References protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::job_outputter(), mark_job_as_bad(), protocols::jd2::JobDistributor::obtain_new_job(), and protocols::jd2::tr().

Referenced by remove_bad_inputs_from_job_list().

core::Size protocols::jd2::MPIFileBufJobDistributor::min_client_rank ( ) const [inline]

return rank of first worker process (there might be more dedicated processes, e.g., ArchiveManager...)

Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::go().

core::Size protocols::jd2::MPIFileBufJobDistributor::n_rank ( ) [inline, protected]

how many processes --- this includes dedicated processes

core::Size protocols::jd2::MPIFileBufJobDistributor::n_worker ( ) [inline, protected]

how many workers --- important to keep track during spin-down process

Referenced by master_go().

bool protocols::jd2::MPIFileBufJobDistributor::next_batch ( ) [protected, virtual]

switch current_batch_id_ to next batch

Reimplemented from protocols::jd2::JobDistributor.

core::Size protocols::jd2::MPIFileBufJobDistributor::number_of_processors ( ) [inline, protected]

how many processes --- this includes dedicated processes

bool protocols::jd2::MPIFileBufJobDistributor::process_message ( core::Size  msg_tag,
core::Size  slave_rank,
core::Size  slave_job_id,
core::Size  slave_batch_id,
core::Real  runtime 
) [protected, virtual]
core::Size protocols::jd2::MPIFileBufJobDistributor::rank ( ) const [inline, protected]
void protocols::jd2::MPIFileBufJobDistributor::remove_bad_inputs_from_job_list ( ) [virtual]

dummy for master/slave version

Reimplemented from protocols::jd2::JobDistributor.

References master_remove_bad_inputs_from_job_list(), and slave_remove_bad_inputs_from_job_list().

Referenced by mark_job_as_bad().

void protocols::jd2::MPIFileBufJobDistributor::send_job_to_slave ( core::Size  slave_rank) [protected]

called by master to send and by slave to receive job

This is the heart of the MPIFileBufJobDistributor. It consistits of two while loops: the job distribution loop (JDL) and the node spin down loop (NSDL). The JDL has three functions. The first is to recieve and process messages from the slave nodes requesting new job ids. The second is to recieve and process messages from the slave nodes indicating a bad input. The third is to recive and process job_success messages from the slave nodes and block while the slave node is writing its output. This is prevent Sizeerleaving of output in score files and silent files. The function of the NSDL is to keep the head node alive while there are still slave nodes processing. Without the NSDL if a slave node finished its allocated job after the head node had finished handing out all of the jobs and exiting (a very likely scenario), it would wait indefinitely for a response from the head node when requesting a new job id.

References protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::JobDistributor::current_job_id(), protocols::jd2::MPI_JOB_DIST_TAG(), and protocols::jd2::tr().

Referenced by process_message(), and slave_get_new_job_id().

void protocols::jd2::MPIFileBufJobDistributor::set_n_worker ( core::Size  setting) [inline, protected]

how many workers --- important to keep track during spin-down process

core::Size protocols::jd2::MPIFileBufJobDistributor::slave_get_new_job_id ( ) [protected]

requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true

References protocols::jd2::JobDistributor::get_current_batch(), protocols::jd2::NEW_JOB_ID, send_job_to_slave(), protocols::jd2::JobDistributor::set_batch_id(), slave_to_master(), and protocols::jd2::tr().

Referenced by get_new_job_id().

void protocols::jd2::MPIFileBufJobDistributor::slave_job_succeeded ( core::pose::Pose pose) [protected]

Sends a message to the head node upon successful job completion to avoid output interleaving.

References protocols::jd2::JobDistributor::current_job(), protocols::jd2::JobDistributor::job_outputter(), protocols::jd2::JOB_SUCCESS, and slave_to_master().

Referenced by job_succeeded().

void protocols::jd2::MPIFileBufJobDistributor::slave_mark_current_job_id_for_repetition ( ) [protected]

Sets the repeat_job_ flag to true.

References protocols::jd2::JobDistributor::current_job_id(), and protocols::jd2::tr().

Referenced by mark_current_job_id_for_repetition().

void protocols::jd2::MPIFileBufJobDistributor::slave_remove_bad_inputs_from_job_list ( ) [protected]

Sends a message to the head node that contains the id of a job that had bad input.

References protocols::jd2::BAD_INPUT, and slave_to_master().

Referenced by remove_bad_inputs_from_job_list().

void protocols::jd2::MPIFileBufJobDistributor::slave_to_master ( core::Size  tag) [protected]

Friends And Related Function Documentation

friend class JobDistributorFactory [friend]

The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines