You are here

JD2 MPI

2 posts / 0 new
Last post
JD2 MPI
#1

Hello

Guys, I'm working with ab initio, homology modeling and docking. I realized running abinitio and modeling by homology the MPI automatically chooses how it will be run the job. The problem is that the abinitio is running easy, but the homology modeling running on the server is so slow as running on my pc simple laboratory by analyzing the output files realized that the homology modeling is used this Procolo (protocols.jd2. MPIFileBufJobDistributor), and abinitio (protocols.jobdist.JobDistributors) which is efficient in my view.


My question is, how to choose the best protocol, for example, reading material rosetta from what I understand the best protocol for homology modeling would be jd 2 :: mpi_work_partition_job_distributor, and the abinitio would be MPIFileBufJobDistributor.

Someone help me?

Post Situation: 
Wed, 2015-10-28 14:45
jrcf

You *can't* use the abinitio job distributor for homology modeling. (In some respects, abinito is its own little world.)

There's a little description about the JD2 MPI job distributors at https://www.rosettacommons.org/docs/latest/development_documentation/tutorials/jd2#mpi-job-distributors. The big caveats are how many nodes you're using for "overhead" (IO/control) versus how many nodes total you have:

If you're only doing MPI with a few nodes, then you'll probably want to use the -jd2:mpi_work_partition_job_distributor, which won't reserve nodes for control or IO.  The caveat is that there's no dynamic control of jobs, so one processor might be twiddling it's thumbs while another still has a list of jobs to get through.

The default is the MPIWorkPoolJobDistributor, which reserves one node for control, and uses N-1 nodes for processing. The "master" node hands out jobs as processors become availible, so you don't waste time with a slow node trying to finish, but you do lose the processing power of the master node.

If you're running a larger number of nodes, filesystem contention becomes an issue. (The run slows down because the filesystem can't handle that many process writing to the directory.) For this case, you'd want to use silent file output and the -jd2::mpi_filebuf_job_distributor option. This reserves one node for command, one node for IO, and N-2 nodes for processing.

If your cluster has diagnostic utilities, you may want to look at CPU usage during the runs, to see how the processor load varies over the run. This might help you plan how to reallocate the processing power to optimize. (e.g. if you're wasting time at the end of the run, you might want to use workpool or filebuf to get the control node. If one or two nodes being idle is a significant drain, then you might want to use work_partion. If your CPUs randomly stall in the middle of the runs, it might be filesystem contention and filebuf might help. If your CPU is maxed throughout the run, it could be your cluster nodes are just significantly slower than your desktop node.)

Sat, 2015-10-31 08:57
rmoretti