Rosetta 3.5
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Classes | Public Member Functions | Protected Types | Protected Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
protocols::jd2::archive::MPIArchiveJobDistributor Class Reference

JobDistributor for the iterative ArchiveManager/Archive Framework. More...

#include <MPIArchiveJobDistributor.hh>

Inheritance diagram for protocols::jd2::archive::MPIArchiveJobDistributor:
Inheritance graph
[legend]
Collaboration diagram for protocols::jd2::archive::MPIArchiveJobDistributor:
Collaboration graph
[legend]

Classes

struct  CompletionMessage
 CompletionMessage(s) are send to the ArchiveManager whenever more than nr_notify decoys have been finished / or when the full batch is finished. More...
 

Public Member Functions

virtual void go (protocols::moves::MoverOP mover)
 overloaded to also start the ArchiveManager process More...
 
void set_archive (archive::ArchiveBaseOP)
 
bool is_archive_rank () const
 
- Public Member Functions inherited from protocols::jd2::MPIFileBufJobDistributor
virtual ~MPIFileBufJobDistributor ()
 dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt More...
 
core::Size increment_client_rank ()
 
core::Size min_client_rank () const
 return rank of first worker process (there might be more dedicated processes, e.g., ArchiveManager...) More...
 
virtual core::Size get_new_job_id ()
 dummy for master/slave version More...
 
virtual void mark_current_job_id_for_repetition ()
 dummy for master/slave version More...
 
virtual void remove_bad_inputs_from_job_list ()
 dummy for master/slave version More...
 
virtual void job_succeeded (core::pose::Pose &pose, core::Real runtime)
 dummy for master/slave version More...
 
virtual void job_failed (core::pose::Pose &pose, bool will_retry)
 This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter. More...
 
- Public Member Functions inherited from protocols::jd2::JobDistributor
virtual ~JobDistributor ()
 
void go (protocols::moves::MoverOP mover, JobOutputterOP jo)
 invokes go, after setting JobOutputter More...
 
JobOP current_job () const
 Movers may ask their controlling job distributor for information about the current job. They may also load information into this job for later output. More...
 
std::string current_output_name () const
 Movers may ask their controlling job distributor for the output name as defined by the Job and JobOutputter. More...
 
JobOutputterOP job_outputter () const
 Movers (or derived classes) may ask for the JobOutputter. More...
 
void set_job_outputter (const JobOutputterOP &new_job_outputter)
 Movers (or derived classes) may ask for the JobOutputter. More...
 
JobInputterOP job_inputter () const
 JobInputter access. More...
 
virtual void mpi_finalize (bool finalize)
 should the go() function call MPI_finalize()? It probably should, this is true by default. More...
 
JobInputterInputSource::Enum job_inputter_input_source () const
 The input source for the current JobInputter. More...
 
virtual void restart ()
 
core::Size total_nr_jobs () const
 
core::Size current_job_id () const
 integer access - which job are we on? More...
 
std::string get_current_batch () const
 what is the current batch ? — name refers to the flag-file used for this batch More...
 
virtual void add_batch (std::string const &, core::Size id=0)
 add a new batch ( name will be interpreted as flag_file ) More...
 
core::Size current_batch_id () const
 what is the current batch number ? — refers to position in batches_ More...
 

Protected Types

typedef MPIFileBufJobDistributor Parent
 

Protected Member Functions

 MPIArchiveJobDistributor ()
 ctor is protected; singleton pattern More...
 
virtual void handle_interrupt ()
 This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn. More...
 
virtual void batch_underflow ()
 triggered in slave if new batch_ID comes in. More...
 
virtual bool process_message (core::Size msg_tag, core::Size slave_rank, core::Size slave_job_id, core::Size slave_batch_id, core::Real run_time)
 act on a message, return true if message was understood More...
 
virtual void mark_job_as_completed (core::Size job_id, core::Size batch_id, core::Real run_time)
 overloaded to allow statistics and sending of CompletionMessages More...
 
virtual void mark_job_as_bad (core::Size job_id, core::Size batch_id)
 overloaded to allow statistics and sending of CompletionMessages More...
 
virtual void load_new_batch ()
 overloaded to start new entries in nr_new_completed_, nr_completed_, nstruct_ and nr_bad_ ... More...
 
core::Size archive_rank () const
 rank of ArchiveManger process More...
 
- Protected Member Functions inherited from protocols::jd2::MPIFileBufJobDistributor
 MPIFileBufJobDistributor ()
 ctor is protected; singleton pattern More...
 
 MPIFileBufJobDistributor (core::Size master_rank, core::Size file_buf_rank, core::Size min_client_rank, bool start_empty=false)
 protected ctor for child-classes More...
 
virtual bool next_batch ()
 switch current_batch_id_ to next batch More...
 
void master_go (protocols::moves::MoverOP mover)
 Handles the receiving of job requests and the sending of job ids to and from slaves. More...
 
core::Size master_get_new_job_id ()
 Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags. More...
 
core::Size slave_get_new_job_id ()
 requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true More...
 
void master_mark_current_job_id_for_repetition ()
 This should never be called as this is handled internally by the slave nodes, it utility_exits. More...
 
void slave_mark_current_job_id_for_repetition ()
 Sets the repeat_job_ flag to true. More...
 
void master_remove_bad_inputs_from_job_list ()
 Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input. More...
 
void slave_remove_bad_inputs_from_job_list ()
 Sends a message to the head node that contains the id of a job that had bad input. More...
 
void master_job_succeeded (core::pose::Pose &pose)
 This should never be called as this is handled internally by the slave nodes, it utility_exits. More...
 
void slave_job_succeeded (core::pose::Pose &pose)
 Sends a message to the head node upon successful job completion to avoid output interleaving. More...
 
void slave_to_master (core::Size tag)
 send a message to master More...
 
void send_job_to_slave (core::Size slave_rank)
 called by master to send and by slave to receive job More...
 
core::Size rank () const
 return rank of this process More...
 
core::Size master_rank () const
 return rank of master process ( where JobDistributor is running ) More...
 
core::Size file_buf_rank () const
 return rank of file-buffer process ( where output data (via ozstream )is handled ) More...
 
core::Size number_of_processors ()
 how many processes — this includes dedicated processes More...
 
core::Size n_rank ()
 how many processes — this includes dedicated processes More...
 
core::Size n_worker ()
 how many workers — important to keep track during spin-down process More...
 
void set_n_worker (core::Size setting)
 how many workers — important to keep track during spin-down process More...
 
void eat_signal (core::Size signal, int source)
 receive a certain signal and ignore it.... this is needed, for instance, when MPIArchiveJobDistributor triggers an ADD_BATCH signal by sending QUEUE_EMPTY to the ArchiveManager... More...
 
- Protected Member Functions inherited from protocols::jd2::JobDistributor
 JobDistributor ()
 Singleton instantiation pattern; Derived classes will call default ctor, but their ctors, too must be protected (and the JDFactory must be their friend.) More...
 
 JobDistributor (bool empty)
 MPIArchiveJobDistributor starts with an empty job-list... More...
 
void go_main (protocols::moves::MoverOP mover)
 Non-virtual get-job, run it, & output loop. This function is pretty generic and your subclass may be able to use it. It is NOT virtual - this implementation can be shared by (at least) the simple FileSystemJobDistributor, the MPIWorkPoolJobDistributor, and the MPIWorkPartitionJobDistributor. Do not feel that you need to use it as-is in your class - but DO plan on implementing all its functionality! More...
 
Jobs const & get_jobs () const
 Read access to private data for derived classes. More...
 
void mark_job_as_completed (core::Size job_id, core::Real run_time)
 Jobs is the container of Job objects need non-const to mark Jobs as completed on Master in MPI-JobDistributor. More...
 
void mark_job_as_bad (core::Size job_id)
 
ParserOP parser () const
 Parser access. More...
 
void begin_critical_section ()
 
void end_critical_section ()
 
bool obtain_new_job (bool re_consider_current_job=false)
 this function updates the current_job_id_ and current_job_ fields. The boolean return states whether or not a new job was obtained (if false, quit distributing!) More...
 
virtual void current_job_finished ()
 Derived classes are allowed to clean up any temporary files or data relating to the current job after the current job has completed. Called inside go_main loop. Default implementation is a no-op. More...
 
virtual void note_all_jobs_finished ()
 Derived classes are allowed to perform some kind of action when the job distributor runs out of jobs to execute. Called inside go_main. Default implementation is a no-op. More...
 
void clear_current_job_output ()
 
void set_batch_id (core::Size setting)
 set current_batch_id — eg for slave nodes in MPI framework More...
 
core::Size nr_batches () const
 how many batches are in our list ... this can change dynamically More...
 
std::string const & batch (core::Size batch_id)
 give name of batch with given id More...
 

Private Member Functions

void _notify_archive ()
 stuff needed for non-blocking communication More...
 
bool receive_batch (core::Size source_rank)
 receive a new Batch from ArchiveManager More...
 
void sync_batches (core::Size slave_rank)
 sync batch queue with slave node More...
 
void master_to_archive (core::Size tag)
 send message to ArchiveManager More...
 
void notify_archive (CompletionMessage const &)
 add a notifcation (CompletionMessage) to the msg queue ... More...
 
void notify_archive (core::Size batch_id)
 work out if a notifcation should be send (using above method) More...
 

Private Attributes

utility::vector1< core::Sizenr_jobs_
 
utility::vector1< core::Sizenr_completed_
 
utility::vector1< core::Sizenr_new_completed_
 
utility::vector1< core::Sizenr_bad_
 
utility::vector1< core::Sizenstruct_
 
core::Size nr_notify_
 
core::Size archive_rank_
 
std::deque< CompletionMessagepending_notifications_
 unsent notifications More...
 
ArchiveBaseOP theArchive_
 

Friends

class protocols::jd2::JobDistributorFactory
 

Additional Inherited Members

- Static Public Member Functions inherited from protocols::jd2::JobDistributor
static JobDistributorget_instance ()
 
- Static Protected Member Functions inherited from protocols::jd2::JobDistributor
static void setup_system_signal_handler (void(*prev_fn)(int)=jd2_signal_handler)
 Setting up callback function that will be call when our process is about to terminate. More...
 
static void remove_system_signal_handler ()
 Set signal handler back to default state. More...
 
static void jd2_signal_handler (int Signal)
 Default callback function for signal handling. More...
 

Detailed Description

JobDistributor for the iterative ArchiveManager/Archive Framework.

Tags used to tag messeges sent by MPI functions used to decide whether a slave is requesting a new job id or flagging as job as being a bad input

This job distributor is meant for running iterative jobs with the ArchiveManager/Archive Framework. could vary greatly. In this configuration the three first nodes are dedicated processes (JobDistributor, FileBuffer, and ArchiveManger ) and the remaining CPUs form slave or worker nodes. This JD will not work at all without MPI and the implementations of all but the interface functions have been put inside of ifdef directives. Generally each function has a master and slave version, and the interface functions call one or the other depending on processor rank.

Member Typedef Documentation

Constructor & Destructor Documentation

protocols::jd2::archive::MPIArchiveJobDistributor::MPIArchiveJobDistributor ( )
protected

ctor is protected; singleton pattern

constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.

References nr_notify_.

Member Function Documentation

void protocols::jd2::archive::MPIArchiveJobDistributor::_notify_archive ( )
private

stuff needed for non-blocking communication

the private implementation of notify_archive. send JOB_COMPLETION message to Archive if a message is in the message queue.

References archive_rank(), protocols::jd2::archive::MPIArchiveJobDistributor::CompletionMessage::batch_id, protocols::jd2::archive::MPI_ARCHIVE_TAG, pending_notifications_, and protocols::jd2::tr().

Referenced by batch_underflow(), and process_message().

core::Size protocols::jd2::archive::MPIArchiveJobDistributor::archive_rank ( ) const
inlineprotected
void protocols::jd2::archive::MPIArchiveJobDistributor::batch_underflow ( )
protectedvirtual
void protocols::jd2::archive::MPIArchiveJobDistributor::go ( protocols::moves::MoverOP  mover)
virtual
virtual void protocols::jd2::archive::MPIArchiveJobDistributor::handle_interrupt ( )
inlineprotectedvirtual

This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn.

Reimplemented from protocols::jd2::MPIFileBufJobDistributor.

bool protocols::jd2::archive::MPIArchiveJobDistributor::is_archive_rank ( ) const
inline
void protocols::jd2::archive::MPIArchiveJobDistributor::load_new_batch ( )
protectedvirtual

overloaded to start new entries in nr_new_completed_, nr_completed_, nstruct_ and nr_bad_ ...

load new batch from BatchQueue .. overloaded to setup the statistics for CompletionMessages

Reimplemented from protocols::jd2::JobDistributor.

References protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::load_new_batch(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), nr_bad_, nr_completed_, nr_jobs_, nr_new_completed_, nstruct_, protocols::jd2::MPIFileBufJobDistributor::rank(), and core::io::serialization::size().

void protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_bad ( core::Size  job_id,
core::Size  batch_id 
)
protectedvirtual
void protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_completed ( core::Size  job_id,
core::Size  batch_id,
core::Real  run_time 
)
protectedvirtual

overloaded to allow statistics and sending of CompletionMessages

overloaded to update our job-statistics ( needed for CompletionMessages )

Reimplemented from protocols::jd2::MPIFileBufJobDistributor.

References protocols::jd2::MPIFileBufJobDistributor::mark_job_as_completed(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), notify_archive(), nr_jobs_, nr_new_completed_, protocols::jd2::MPIFileBufJobDistributor::rank(), and protocols::jd2::tr().

void protocols::jd2::archive::MPIArchiveJobDistributor::master_to_archive ( core::Size  tag)
private

send message to ArchiveManager

send message to ArchiveManager .. eg. QueueEmpty always send current_batch_id with the message ... used to determine if QueueEmpty is outdated

References archive_rank(), protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), protocols::jd2::archive::MPI_ARCHIVE_TAG, and protocols::jd2::MPIFileBufJobDistributor::rank().

Referenced by batch_underflow().

void protocols::jd2::archive::MPIArchiveJobDistributor::notify_archive ( CompletionMessage const &  msg)
private
void protocols::jd2::archive::MPIArchiveJobDistributor::notify_archive ( core::Size  batch_id)
private

work out if a notifcation should be send (using above method)

work out if CompletionMessage should be send... looks at completed/bad decoys send "final" message if all jobs done... sends "update" message if nr_new_completed_ > nr_notify

References protocols::jd2::JobDistributor::current_job_id(), protocols::jd2::JobDistributor::get_jobs(), notify_archive(), nr_bad_, protocols::jd2::JobDistributor::nr_batches(), nr_completed_, nr_jobs_, nr_new_completed_, nr_notify_, protocols::jd2::MPIFileBufJobDistributor::number_of_processors(), core::io::serialization::size(), and protocols::jd2::tr().

bool protocols::jd2::archive::MPIArchiveJobDistributor::process_message ( core::Size  msg_tag,
core::Size  slave_rank,
core::Size  slave_job_id,
core::Size  slave_batch_id,
core::Real  run_time 
)
protectedvirtual
bool protocols::jd2::archive::MPIArchiveJobDistributor::receive_batch ( core::Size  source_rank)
private
void protocols::jd2::archive::MPIArchiveJobDistributor::set_archive ( archive::ArchiveBaseOP  archive)
void protocols::jd2::archive::MPIArchiveJobDistributor::sync_batches ( core::Size  slave_rank)
private

sync batch queue with slave node

sync batches with worker nodes.. this is called if they get a job for a batch they don't know yet... this method will send ALL batches they don't have yet.

References protocols::jd2::archive::ADD_BATCH, protocols::jd2::JobDistributor::batch(), protocols::jd2::MPIFileBufJobDistributor::master_rank(), protocols::jd2::MPI_JOB_DIST_TAG(), protocols::jd2::JobDistributor::nr_batches(), protocols::jd2::MPIFileBufJobDistributor::rank(), receive_batch(), core::io::serialization::size(), and protocols::jd2::tr().

Referenced by batch_underflow(), and process_message().

Friends And Related Function Documentation

Member Data Documentation

core::Size protocols::jd2::archive::MPIArchiveJobDistributor::archive_rank_
private

Referenced by archive_rank().

utility::vector1< core::Size > protocols::jd2::archive::MPIArchiveJobDistributor::nr_bad_
private
utility::vector1< core::Size > protocols::jd2::archive::MPIArchiveJobDistributor::nr_completed_
private

Referenced by load_new_batch(), and notify_archive().

utility::vector1< core::Size > protocols::jd2::archive::MPIArchiveJobDistributor::nr_jobs_
private
utility::vector1< core::Size > protocols::jd2::archive::MPIArchiveJobDistributor::nr_new_completed_
private
core::Size protocols::jd2::archive::MPIArchiveJobDistributor::nr_notify_
private
utility::vector1< core::Size > protocols::jd2::archive::MPIArchiveJobDistributor::nstruct_
private

Referenced by load_new_batch(), and mark_job_as_bad().

std::deque< CompletionMessage > protocols::jd2::archive::MPIArchiveJobDistributor::pending_notifications_
private

unsent notifications

Referenced by _notify_archive(), and notify_archive().

ArchiveBaseOP protocols::jd2::archive::MPIArchiveJobDistributor::theArchive_
private

Referenced by go(), and set_archive().


The documentation for this class was generated from the following files: