You are here

Reducing PyRosetta memory usage

5 posts / 0 new
Last post
Reducing PyRosetta memory usage
#1

I’m really happy with my PyRosetta prototype script, so it’s time for a production run!
But there is a problem. My nodes have 4GB of memory and 8 cores. But PyRosetta needs ~1.5GB of memory to run.

How can I still make use of all 8 cores?
Can I somehow reduce the memory usage, since I don’t need features like orbitals, etc?
I tried to set `low_memory_mode` to True in __init__.py
config = {"low_memory_mode": True, "protocols": False, "core": True, "basic": True, "numeric": False, "utility": True, 'monolith': False},
but this had no noticeable effect.

Or can I make use of threads? I Know Python is locked the global interpreter lock (GIL). But most of the CPU time is spent in C++ code, and at least in theory it’s possible to release the GIL before calling a C++ function (for example) scorefx(pose). But I’m not sure if PyRosetta does that.

Any help is very welcome. Until then I’ll chug along at half speed:)

PS:
I already tried using
r.init('-linmem_ig 20')
But that did not change anything.

My peptide is only 33 amino acids, and I'm (only) using the mm_* score functions.

Category: 
Post Situation: 
Fri, 2014-10-03 01:51
ajasja

And I also threw out most of the residues in database\chemical\residue_type_sets\fa_standard\residue_types.txt. Can I somehow prevent the loading of the rest of the database? (For example orbital information, fragments, itd…)

Mon, 2014-10-06 02:36
ajasja

Rosetta isn't really set up for threads on the C++ level, so even aside from the GIL, that's not going to be an option.

Removing residues helps, but removing patches from database\chemical\residue_type_sets\fa_standard\patches.txt is a real savings, especially if you start removing things which can be applied combinatorially. For just protein modeling, you can replace it with the patches.txt.slim version in the same directory. I'd start with that as a base, and only add back the patches which you think you'll need.

Most of the information in the database is loaded on an as-needed basis, so if you aren't using it, it won't be taking up memory.

Past that, the other recommendation is to make sure you aren't keeping around references to objects you aren't using, or only loading objects on an as-needed basis. For example, Poses can take up a bunch of memory, so instead of loading all of your poses at the beginning and storing them in a list, load them one-by-one if you can, and delete/overwrite the pose object once you're done with it. The same goes for movers and other objects - often they're small, but occasionally there's ones which will store references to Poses or other large objects, keeping large amounts of memory in use, even if you don't need them.

The other recommendation is to avoid using the "monolith" build of PyRosetta if you can. That loads all of Rosetta into memory, even if you're not using those portions of the code. The "namespace" build only loads those portions of the code that you need when you need it, saving some memory if you're not using all of Rosetta. (There might not be a Windows namespace build, though.)

Mon, 2014-10-06 15:22
rmoretti

I remember Sergey saying that "monolith" will be the mainstream build in the future (https://www.rosettacommons.org/node/3720).

Fri, 2015-04-10 07:41
cossio

Re enabling low-memory-mode: this should be done by modifying config.json from the same dir. When you directly try to adjust default values in __init__.py they got overwritten by the very next line when script loads json config file and which have low_memory mode set to false. So please try again by modifying json file.

Also: try to limit your imports and remove everything that you do not need. Each import in namespace mode consume some memory.

Also: to make sure that low-memory mode is worked try to run: python -c "import rosetta; rosetta.init(); print rosetta.config". And you can also check 'top' output from bare input to see how much memory you allocated.

Hope this helps,

Fri, 2015-04-10 11:25
Sergey