Rosetta RNA denovo huge memory usage (probable memory leak)

6 posts / 0 new
Rosetta RNA denovo huge memory usage (probable memory leak)
#1

Hi,

I have some trouble with the application RNA denovo from the latest build (3.8). Both on linux computer (latest linux mint) and calculation cluster (cant tell about the system architecture sorry). For a small RNA 14 base pairs and a loop, the memory usage of the RNA denovo application constently increase. By this I meanthat each structure use sligtly more memory than the preceding one. If the first one start at let's say 350 mb, by the n°10 I get arround 1GB, and it goes on until the process crash after a bad_alloc error when system run out of memory, my linux system has 16 GB of memory (or after the server kill the process because it try to use more memory than allocated tested with 1 2 and 4 GB).

That really look like a memory leak, the problem seems more bad with no secondary structure defined (sometimes go to 2 GB at the third structure), if it can help you to find the problem.

A very simple workaround (and that really confort me in the fact it is indeed a memory leak problem) is to use bash to kill the process every 2 structure ands start again, it just generate thousands of silent file, but it work fine, allways maintain the memory under 700 Gb for small sequences at least.

Just thought you may want to know about this (and in case you have a cleaner workaround than mine !)

Thanks

Clement

Category:
Post Situation:
Mon, 2017-10-30 16:41
cdegut

This is really interesting. We do run memory leak checks on our code, but the only contexts we do so are in nstruct = 1 (for speed). So if memory is being leaked between one pose and another, that's a problem.

This does at baseline make plausible sense, though: each Pose -- for the low resolution scoring terms -- stores nres * nres * 20 additional doubles (for base pairing and stacking). This could be made somewhat more efficient, but that's a larger development question.

I believe I understand what the issue here is. Our job distributor (unlike the one for other Rosetta applications) holds on to many Poses at once, then every so often writes them all out to disk. This is valuable because denovo jobs are often quite fast, especially for small systems, and many clusters allow you to run, say, 250 jobs at once. Writing to disk 250 times per 10 seconds would get you a warning from a sysadmin, so the job distributor figures out a reasonable number of structures to cache at once... with this memory tradeoff. I think once we finish working with a Pose and are getting ready to write it to file, we can probably manually get rid of its cached base pairing energies.

To check that this is in fact the issue, if you take a Pose of comparable size and run the rna_minimize application with -score:weights rna/denovo/rna_lores.wts active, with the same nstruct, do you see the same issue? (The precise same memory values may not apply, but the trend should.)

Mon, 2017-10-30 21:31
everyday847

Ok thanks I'll check that tomorow

Tue, 2017-10-31 10:29
cdegut

Hi,

I did not managed to reproduced the problem with the application rna_minimize, I may have misunderstood thought. I did run :

rna_minimize.static.linuxgccrelease -in:file:silent all_silent.out  -score:weights $ROSETTA/main/database/scoring/weights/farna/rna_lores.wts With the all_silent.out beeing a silent file containing arround 7 000 structures. This did not lead to any increase of memory usage from first to last structure. But did you wanted to run the minimisation again and again on the same structure ? is their an option to do that ? Clément Thu, 2017-11-02 06:54 cdegut No, that should have roughly the same effect. Oddly, when I tried to reproduce the same thing (using Massif, a memory profiling tool provided by Valgrind) I didn't see this effect either. Thu, 2017-11-02 21:14 everyday847 Hum that's odd, here is the exact script that lead to crash on the server and local computer (seed offfset is generated by the task number), maybe there is a problem in my option ? /scratch/ab1440/software/rosetta/main/source/bin/rna_denovo.static.linuxgccrelease \ -database /scratch/ab1440/software/rosetta/main/database \ -fasta StartingSeq.fasta \ -obligate_pair 1 35 \ -nstruct 200 \ -bps_moves \ -seed_offset$seed \
-out:file:silent silent_${SGE_TASK_ID}-$y.out \
-minimize_rna true
Fri, 2017-11-03 05:50
cdegut