You are here

Run protein-protein docking parallelly by mpi

4 posts / 0 new
Last post
Run protein-protein docking parallelly by mpi
#1

I run docking_protocol parallelly by mpi (Command: mpiexec -np 10 $ROSETTA_BIN/docking_protocol.mpi.linuxclangrelease @flag_docking).

But the error below occurred.

[ERROR]

-----------------------------------------------------------------------------

Primary job terminated normally, but 1 process returned.

a non-zero exit code. Per user-direction, the job has been aborted.

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

mpiexec detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do was:

    Process name:    [ [ 45390 , 1 ] , 8 ]

    Exit code:             255

-----------------------------------------------------------------------------

But the score file and the outputted PDBs were obtained, and it seems like the ERROR doesn't affect the docking procedure and output.

 

Category: 
Post Situation: 
Thu, 2023-02-02 06:23
Zehui Zhou

That's a general error from the MPI system saying that one of the MPI processes exited unexpectedly. Without further information from the Rosetta logs (or a Rosetta crash file), it's hard to say what might have been the underlying cause.

If you got all the outputs you're expecting, it might not actually matter -- it might just have been an intermittant glitch in coordination between the processes while they were shutting down. If I recall correctly, there's some semphoring that needs to happen to tell the MPI system that everything is done and it's okay to exit, and if that doesn't happen properly, you might get that sort of error.

I would not be concerned about the scientific accuracy of your results from that message. If you got the quantity of outputs you expected, you shoud be okay.

Thu, 2023-02-02 07:49
rmoretti

Thanks for your suggestions!

I noticed that ROSETTA has some documentation about MPI Job Distributors (https://www.rosettacommons.org/docs/latest/development_documentation/tutorials/jd2#mpi-job-distributors), which mentions several MPI modes. Your previous answer indicated that MPIWorkPoolJobDistributor is the default one (https://www.rosettacommons.org/node/9557). Can I manually choose the mode to run ROSETTA, perhaps the MPIFileBufJobDistributor, rather than the default one? Whether this will solve my ERROR?

Thu, 2023-02-02 20:23
Zehui Zhou

Finally, I figure it out!

As the number of decoys increased, I cannot obtain the desired number of PDBs, this ERROR occurred in the middle of the process which interrupted the following docking. I think that the ERROR is related to the OUTPUT section as I noticed a WARNING of "failing to output XXX.pdb, retry.......".

Inspired by your previous answer (https://www.rosettacommons.org/node/9557), I use -out:file:silent to output the docking results and switch to MPIFileBufJobDistributor (-mpi_file_buf_job_distributor true), the ERROR doesn't occur and the silent output is easier to transfer and process.

Fri, 2023-02-03 05:51
Zehui Zhou