You are here

Problems running pmut in parallel with openmpi

3 posts / 0 new
Last post
Problems running pmut in parallel with openmpi
#1

Hi there,

I am running into problems with pmut using openmpi. We successfully compiled Rosetta3.4 with mpi support, but now that we run pmut with the openmpi command 'mpirun -np 8', the job does not distribute 'the work of creating and scoring all mutants evenly across all available CPUs' (as it says in the manual), but instead calculates every mutant 8 times: the log file lists every entry 8 times (see below). Am I doing something obvious wrong? Or did the compilation with openmpi fail?

Thanks a lot for your help in this matter.

Rene

###########

Here's the command script I use:

mpirun -np 8 pmut_scan_parallel.linuxgccrelease \
-database /opt/rosetta3.4/rosetta_database \
-s XXX.pdb \
-ex1 \
-ex2 \
-extrachi_cutoff 1 \
-use_input_sc \
-ignore_unrecognized_res \
-no_his_his_pairE \
-multi_cool_annealer 10 \
-mute basic core

The output log file looks like this:

protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: go(): master node
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number single mutants possible: 9044
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants possible: 0
protocols.pmut_scan.PointMutScanDriver: fill_mutations_list(): number double mutants excluded for distance: 0
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: mutation mutation_PDB_numbering average ddG average total energy
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15I A-L15I -2.935 -521.94
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15T A-L15T -1.388 -520.39
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-L15V A-L15V -2.474 -521.47
protocols.pmut_scan.PointMutScanDriver: A-T16V A-T16V -1.207 -519.48
protocols.pmut_scan.PointMutScanDriver: A-T16V A-T16V -1.207 -519.48
protocols.pmut_scan.PointMutScanDriver: A-T16V A-T16V -1.207 -519.48
protocols.pmut_scan.PointMutScanDriver: A-T16V A-T16V -1.207 -519.48
protocols.pmut_scan.PointMutScanDriver: A-T16V A-T16V -1.207 -519.48
protocols.pmut_scan.PointMutScanDriver: A-T16V A-T16V -1.207 -519.48

Post Situation: 
Fri, 2012-09-21 06:25
RMJ

Run pmut_scan_parallel.linuxgccrelease without mpirun and with no options. If it fails with an mpi-related error (mine is

"PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required environment variable: MPIRUN_RANK"

but yours may be different), then you are using the MPI executable correctly and the problem is elsewhere. If it instead fails with

"basic.options.util: Use either -s or -l to designate one or more start_files
ERROR:: Exit from: src/basic/options/util.cc line: 109"

then you are not using the MPI executable.

I notice that MPI is not in your executable name: pmut_scan_parallel.linuxgccrelease. There should be a pmut_scan_parallel.mpi.linuxgccrelease in bin/ if you compiled in MPI. There should be a pmut_scan_parallel.default.linuxgccrelease if you compiled not-in-mpi, and both should be present if you compiled both. The pmut_scan_parallel with no insertion (no mpi or default) points to whatever was compiled most recently, so if you compiled mpi then non-mpi, but tried to use the unspecified symlink in bin, you'll get this behavior.

Fri, 2012-09-21 06:38
smlewis

THanks for your help. It works now. Was using the wrong executable :P

Fri, 2012-09-21 08:43
RMJ