You are here

Trouble running MPI docking protocol, please help!

3 posts / 0 new
Last post
Trouble running MPI docking protocol, please help!
#1

 Hi all,

I'm quite new to Rosetta (and computational approaches in general, I've only been using linux and bash-based interfaces for about 6 months) and I've spent a few weeks trying to understand the docking process, which I think I understand fairly well. I've run into a problem that I hope you can help me with. Apologies for any naivety/confusion/incorrect terms that I use, I'm still very much an amateur so I might not explain myself in the best way. 

What I want to do

I've organised my docking flag and relaxed my input structure, and now I'm at the stage where I wish to do a global docking production run with ~100,000 models or more over 100 CPU cores. Naturally, for this purpose I'm using my university's High Performance Computing hub. The hub uses SLURM as the resource manager and implements MPI, and the IT team have installed and compiled Rosetta with MPI. I'm attempting to run my simple global docking script over a number of CPUs and nodes. I've read the Rosetta MPI information and as far as I understand it, it should be as straightforward as executing the MPI version of the docking program and adding the relevant information to SLURM to allocate CPU/node resources.

The problem I'm having

The problem is, after I do this I can see that the resources have been allocated, but I don't think the CPUs are being utilised. I've done a few trial runs to produce 10 models on 1 node (that has 28 CPU cores), with each run assigning more tasks to the node (1, 2, 4, 6, 8, 10 tasks per node in 6 different runs, respectively, with 1 CPU core allocated to each task). The issue is, I'm not seeing a linear reduction in the processing time relative to the amount of tasks, and I would think that (for example), a node running 10 tasks using MPI would take ~1/10th the time as 1 node using 1 task. I'm not particularly seeing any improvement in processing time with the increase of task number, so I'm thinking that I have an issue with my code. I'm pretty sure the SLURM script to add to the resource queue is fine, because I can see that there have been (for example in the 10 task test run) 10 CPUs allocated to the job, which makes me suspect that the Rosetta script isn't working as intended. If anyone could have a look and suggest what I might be doing wrong, I'd be forever grateful! I'll leave the plot of tasks per node vs processing time and script information down below.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Processing time relative to tasks allocated per job

Tasks per node Process time
1 541
2 334
4 387
6 336
8 358
10 293

 

Rosetta script (saved as test_script.sh)

#!/bin/bash

#SBATCH --job-name=test_job
#SBATCH --partition=test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=<either 1, 2, 4, 6, 8 or 10> 
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --mem-per-cpu=1000M

module load apps/rosetta/2018.33

export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
srun docking_protocol.mpi.linuxgccrelease @test_flag

 

Docking flag (saved as test_flag)

-in:file:s F1_didomain_global.pdb

-nstruct 10

-partners A_B
-dock_pert 3 24
-spin
-randomize1
-randomize2

-ex1
-ex2aro

-out:suffix _test

-score:docking_interface_score 1

 

Files in my working directory

[n00baccount@topsecretHPC tester]$ ls -lct
total 512
-rw-r--r-- 1 n00baccount bioc    292 Aug 17 10:22 test_script.sh
-rw-r--r-- 1 n00baccount bioc    175 Aug 17 09:34 test_flag
-rw-r--r-- 1 n00baccount bioc 359148 Aug 16 22:05 F1_didomain_global.pdb

 

Example of submission

[n00baccount@topsecretHPC tester]$ sbatch test_script.sh 
Submitted batch job 3961469

 

Example of SLURM showing resources being allocated for two different jobs (1 task per node vs 10 tasks per node with 1 CPU per task)

[n00baccount@topsecretHPC tester]$ sacct -u n00baccount
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
3961468        test_job       test    default          1    RUNNING      0:0 
3961468.0    docking_p+               default          1    RUNNING      0:0 
3961469        test_job       test    default         10    RUNNING      0:0 
3961469.0    docking_p+               default         10    RUNNING      0:0 

 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Any help would be very very much appreciated.

Thanks very much for reading!

Rob

Category: 
Post Situation: 
Tue, 2020-08-18 06:40
Rob_Barringer

Hi Rob,

This is more a question of how to use MPI correctly with Rosetta and that would depend on how Rosetta was installed with MPI. I can share with you what my SLURM scripts look like. The one attached will run 59 (not 60, 1 is a master node) independent docking processes.

Also, whenever running any Rosetta process with mpi, be sure to include the following flags:

-jd2:failed_job_exception false
-mpi_tracer_to_file     logs/docking_tracer.out

 

Hope this helps,

Shourya

#!/bin/sh

#SBATCH --job-name=docking_job
#SBATCH --partition=mpi
#SBATCH --time=60:0:0
#SBATCH --ntasks=60
#SBATCH --mem=200GB
#SBATCH --output logs/docking.%j.out
#SBATCH --error logs/docking.%j.err

# loading and unloading modules
module load gcc/6.2.0
module load openmpi/3.1.0

# job description
ROSETTABIN=$HOME/Rosetta/main/source/bin
ROSETTAEXE=docking_protocol
COMPILER=mpi.linuxgccrelease
EXE=$ROSETTABIN/$ROSETTAEXE.$COMPILER

# running with a date and time stamp
echo Starting MPI job running $EXE
date
ulimit -l unlimited
time mpirun -np 60 $EXE @docking_flags
date

 

Sun, 2020-09-20 16:06
ssrb

Hi Shourya,

Thanks a lot for commenting, I fear that this was a learning curve in basic script-writing!

After asking around my lab, it appears that I did not need to use the MPI version of the software at all! I was under the impression I had to, but apparently when submitting a script on slurm, the normal version of docking protocols in Rosetta are fine to use and the program distributes to the nodes correctly, so long as you add appropriate the appropriate prefix, suffix and silent file outputs.

For posterity, in case anybody comes across this post and has the same issue, this is the script I used:

#!/bin/bash

#SBATCH --job-name=N00b_j0b
#SBATCH --partition=serial
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=4500M
#SBATCH --array=1-48

module load apps/rosetta/2018.33

srun docking_protocol.linuxgccrelease @Flag_48_arrays -out:suffix $SLURM_ARRAY_TASK_ID -out:prefix $SLURM_JOBID -out:file:silent 48_array_docks_$SLURM_ARRAY_TASK_ID


Thanks again for your help, I'll be sure to defer to this comment again if I ever need MPI help!

Rob

Fri, 2020-09-25 05:45
Rob_Barringer