Our lab is relatively new at docking and have never used scons before. We are trying to run docking simulations on a HPC using Snugdock, but are having trouble speeding up the process. The way that we are currently running the simulations, it is taking ~30 minutes to complete 1 model. I compiled the programs using the command './scons.py -j 10 mode=release bin extras=cxx11thread', but it seems that it is still unable to run on 10 cpus. The flags that we used were '-nstruct 10 -partners LH_A -spin -multithreading:total_threads 10'. I was wondering if you someone could clarify if there was a step that I may have been missing. Thank you.
Most Rosetta protocols are not multithreading enabled. Generally speaking, if a protocol doesn't explicitly mention it can use threads, it probably won't benefit from adding the threading options.
Instead, to use multiple CPUs we normally take advantage of the fact that Rosetta protocols are "trivially parallelizable" - that is, the normal use case calls for multiple output structures and each output structure is generated independently of the others.
So instead of having one process which generates all 10,000 models you need for an accurate analysis (e.g. passing `-nstruct 10000`), you instead launch multple process, each with a smaller number of output structures (e.g. 10 separate runs each with `-nstruct 1000` instead, or 1000 processes each with `-nstruct 10`). If you use the default settings (i.e. if you do not pass -constant_seed), structures from different runs are scientifically indistiguishable from different structures from the same run (for most protocols). You can do the multiple separate runs and then just combine the different results for your post-analysis phase.
The trick is to keep track of the different runs an make sure they don't overwrite each other. The simplest way is to simply run the different processes each in their own directory. There's also the -out:file:prefix and -out:file:suffix options. If you specify a different setting for each run on those, each output file will be tagged with the run from which they come from. If you're using a queueing system, there may be explicit support for doing things like this. (For example, with SLURM there's an --array option, which will launch multiple nearly-identical jobs, only differing in an environment variable, which you can use to set the prefix/suffix option.)
The other potential option for multiprocess runs is MPI. There's some protocols which make more extensive use of it, but for most protocols the MPI communication is just used to coordinate jobs. This way you can do a single launch (albeit with multiple CPUs/processes) specified with your MPI launcher, and then they'll use MPI to coordinate and keep from stepping on each others toes. -- But aside from the output file management, there's often no benefit doing MPI over the manual job splitting, so the extra effort to get MPI to work might not be worth it.