You are here

Problem with mpirun/mpiexec

2 posts / 0 new
Last post
Problem with mpirun/mpiexec
#1

I am attempting to relax a glycosylated protein into a cryoEM map and get the following error: 

"mpirun noticed that process rank 2 with PID 8288 on node imsb0632 exited on signal 9 (Killed)."

I get this error with both mpirun and mpiexec.

Can anyone explain more about what is causing this issue?

Category: 
Post Situation: 
Fri, 2020-07-10 08:37
ahansel

"Signal 9 (Killed)" normally means that something external to Rosetta stopped the run.

If you didn't manually kill the run yourself, the most common cause of such a message would be the system running out of memory, and the OS killing the process to try to free up memory.  Keep in mind that with MPI, each process launched needs its own address space, so the amount of availible memory for Rosetta goes down proportionally to the number of processes launched. Typical Rosetta runs need somewhere between 1-2 GB of RAM per process, but certain protocols (large cryoEM maps for example) may need more. You can monitor the free memory during the run to see if you're filling it up all the way. If you are, you can try running fewer processors (even if there are some left idle), or look into things like tweaking your run parameters or downscaling the cryoEM map.

The other possibility is if you're running on a cluster or some other job queuing system (like SLURM), where the job queuing system is enforcing resource limits. (memory, runtime, number of processors, etc.) If this is the case, the logs for the queuing system (versus the Rosetta log) should have information about why the jobs were killed.

Fri, 2020-07-10 09:59
rmoretti