You are here

Monte Carlo usage?

30 posts / 0 new
Last post
Monte Carlo usage?
#1

Hi all, I'm relatively new to this suite (in fact, I've come here from Rosetta 3.4...which I was not successfully able to run...). I installed PyRosetta becuse it had a native Windows installer, and here I am.

Anyway, I'm trying to figure out how to pass a sequence to the engine and have it spit out a probabilistically plausible folding map in viewable pdb form. The sequence I'm running has no homology to anything in the existing databases, so I'm trying to computationally model this fragment from scratch. I did some digging and apparently there's Monte Carlo objects hidden for use, but I don't know how to use them, nor implement my sequence the way I want to. For input, I'd be providing a pdb with whatever structure PyMol created as I manually formed the sequence (naturally added kinks, etc). Generally, it'd be straight and wholly unmodeled. Could anyone help?

Thanks,

Alex

Post Situation: 
Tue, 2012-08-28 13:24
atruong

What do you mean by folding map?

It sounds like you want to run vanilla ab initio. If you're running on Windows...well...it's not going to work. You need at least 1000 and really about 100000 trajectories to get useful results from ab initio - you need a supercomputer not a desktop. (I've never heard of a Windows supercomputer). Were you trying to install 3.4 into Windows?

I don't know about job distribution, but you should be able to access the ab initio application directly in PyRosetta - look for a class abrelaxapplication?

Tue, 2012-08-28 13:30
smlewis

Hmm...well I installed 3.4 onto an older redhat Linux build, but it's on a remote cluster, and I'm not proficient enough at linux command line to even begin to figure out how to actually execute stuff on it. I picked PyRosetta simply because it seemed more accessible, though I knew the hardware risks that came with it...and apparently, that's come back to bite me in the rear. Anyway, basically, yeah...I'm trying to run vanilla ab initio...with all the crap that comes with. My sequence is 55aa long...I suppose with the number of iterations needed per residue, that'd still require a lot more processing power than I have on my Windows machine. Perhaps you could advise my next step to take? If necessary we could move this thread to the Rosetta 3.4 forum if you think it'd be more likely to be executed there. I have considerably more processing power on that Linux cluster, but again, it's only via remote access, and I might as well have the skill of an infant when it comes to that...not that I'm much better with Python. My programming background is mainly in java, so the horizontal transition is ordinarily simple, but this is...on a different level for me.

Tue, 2012-08-28 13:43
atruong

Rosetta is all command line anyway, so it being on a remote cluster makes no difference.

You do need basic Linux CLI skills (cd, cp, rm, et cetera) to do anything with command line, of course. There are tons of general tutorials on this on the internet. UNC runs classes quarterly or so to teach users Linux command line; maybe your university does too.

Most academic cluster supercomputers have a specific system for job submission, and at least *some* user support structure. At the very least there must be a scheduler like LSF or condor or something to manage user jobs. Those are not terribly hard to learn (there's maybe half a dozen commands to learn?) I can't give you more precise advice without knowing which management scheme your cluster uses. Again, your university might have a class on it.

Generally Rosetta users compile it locally (there's no installation to speak of), so you shouldn't have too much hassle there with getting it installed (plus it's a common question so most of the problems have been ironed out already).

There are several abinitio demos in the 3.4 demos folder. The vanilla one isn't really a demo, it's just a pile of inputs and command line flags, but the other demos are more clear and the documentation explains the flags.

I would guess abinitio is about 30 sec/structure, so you can budget that out if you want to try to run it on your desktop (maybe 3.5 days for 10,000?)

I strongly suspect the abinitio machinery is not terribly PyRosetta friendly (it's not anybody-friendly), so I think you'd have at least half as much work trying to get it to run from python as just learning to use a linux CLI.
___

After writing all that I had a better idea:

I checked the PyRosetta website (duh) and found that the Gray lab has reimplemented abinitio directly in python: http://graylab.jhu.edu/pyrosetta/downloads/scripts/D060_Folding.py

I don't know how it compares scientifically (probably very well for vanilla ab initio), and it looks like it ought to run out of the box for you if you just want to run it immediately. No idea on runtime - I still think a desktop isn't powerful enough. This uses PyJobDistributor which might work on your cluster.

(Note that script does NOT touch the existing rosetta abinitio machinery - it just reimplements the control flow for vanilla abinitio and ignores all the complex stuff for inputting experimental data to guide modeling).

Tue, 2012-08-28 14:10
smlewis

I had actually stumbled across that exact file while rummaging through the Scripts list, but I was confused by "fragment insertion". Then again, I'm new at this (not just Linux/Rosetta, but computational modeling in general...this would be my first foray into the actual craziness, and not straightforward webapps like SWISS-MODEL, etc.), so perhaps it's a related concept after all. Thanks for the clarification though. I suppose I could try it on my desktop, and if it takes too long, try it (or an analogue...which I assume is also buried in the folders somewhere?) on the Linux cluster, I suppose. Is there a built-in overflow warning? Or is it just going to busy-loop? How will I know if my computer is unable to perform the calculation?

Edit: In the comments, it states that 2 fragment files must be provided...is that not indicative of homology modeling? I only have one file to use (since it has no homology with anything else). Is there something else going on here that I'm not understanding? Sorry if I'm seeming a bit...inept; I definitely feel it.

Tue, 2012-08-28 14:24
atruong

"Is there a built-in overflow warning? Or is it just going to busy-loop? How will I know if my computer is unable to perform the calculation?"

The spec that Rosetta needs is lots of processors, because Monte Carlo is embarassingly parallel, and you can just run trajectories totally independently. It's not too hungry for RAM or disk space. I've never seen it hang because the computer "isn't good enough" - it's not like running a too-beefy desktop app which locks up the machine. You'll know your computer isn't good enough when it doesn't produce results fast enough for your purposes. It will max out precisely one processor or core. If it's producing one structure a minute, then you need to decide if you can wait 10,000 minutes for a good sample size. One ab-initio result is nearly meaningless.

Tue, 2012-08-28 14:29
smlewis

Hmm...okay, I guess I'll just see what it spits out and judge from there.

What about that second file in the Gray lab script? Am I just misunderstanding the instructions or something?

Tue, 2012-08-28 14:41
atruong

I haven't read the whole thing, can you be more specific?

Tue, 2012-08-28 14:43
smlewis

At the very top in the block comment:

"This script performs fragment insertion for an input protein sequence. The
sequence may be explicit, in a FASTA file, or in a PDB file. Two fragment files,
preferably one longer than the other, must also be provided. Fragment insertion
is accompanied by a Monte Carlo assessment allowing the conformation to escape
local minima. Output structures from this protocol are intended to proceed into
a refinement step (such as that in refinement.py) to produce reasonable
estimates of the protein conformation."

But directly below that, in the instructions section, it only requires one pdb. Am I misunderstanding the meaning of "fragment file"?

Tue, 2012-08-28 14:47
atruong

http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d4/d...

The Robetta webserver is the easiest way to make fragments.

I think the Rohl 2003/2004 Methods in Enzymology paper explains what fragments are...it's a classic part of the ab initio algorithm, lots of papers cover it.

Tue, 2012-08-28 15:42
smlewis

Oh, okay. So I have my fragment files and PSIPRED file (all from ROBETTA). Stupid question: where do I put these files to load them? The root rosetta directory? Or in the same directory as the /../../rosetta_source/src/apps/public folder that contains the AbinitioRelax.cc executable?

Also, I think I'm going to revisit Rosetta 3.4 for this project, so feel free to move this thread if you'd like.

Thu, 2012-08-30 13:00
atruong

You can choose to put the files (you only need the fragment files, I think) whereever you want; you tell Rosetta where they are by offering paths to the files as command line options.

Functionally, the directory where you are going to run the code and collect outputs is the right place to put the inputs. Inside the code itself is NOT a good place. Just make a nice empty directory somewhere (not within rosetta), put your inputs in it, and run from there.

Thu, 2012-08-30 13:24
smlewis

How do I specify the paths for Rosetta to access? I know how to change directories and the like; I'm not sure I know how to operate within one while grabbing information from another though.

Thu, 2012-08-30 13:31
atruong

You have to offer Rosetta a giant list of command line flags to get it to actually do anything. It doesn't look for *anything* by default, it just takes all inputs from command line. Take a look at the abinitio demos and you can see some of the options/flags files.

You'll have stuff like (http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d0/d...):

-in:file:fasta 1elwA.fasta)
-in:file:frag3 aa1elwA03_05.200_v1_3
-in:file:frag9 aa1elwA09_05.200_v1_3
-database /PATH/TO/THE/rosetta_database
etc, etc

Generally we keep all these options in a file and offer it like:
AbinitioRelax.linuxgccrelease @options_file

but you can instead list them all out:
AbinitioRelax.linuxgccrelease -in:file:fasta 1elwA.fasta -in:file:frag3 aa1elwA03_05.200_v1_3 -in:file:frag9 aa1elwA09_05.200_v1_3 -database /PATH/TO/THE/rosetta_database ........

Thu, 2012-08-30 13:35
smlewis

So to specify my input file directory, I slap the entire path in in front, like

-in:file:fasta /../../blah.fasta

or something like that? Based on the flags you've specified (and that I was reading on the link you provided that I'd actually stumbled upon just a little while earlier), it only specifies the file name. The flags look different from the database call, which was how I was expecting it to look. Is that how I'd do it?

Thu, 2012-08-30 13:43
atruong

Rosetta isn't interpreting paths itself. Anything the file system will accept (anything that works for ls or cat or whatever) will work, so relative paths (../path/to/something/else), paths to subdirectories (path/to/subdirectory with an implicit leading ./) and absolute paths (/absolute/path/to/a/thing) all work.

Avoid shell-interpreted special characters, particularly ~ for home. A common mistake is trying to pass "-database ~/rosetta/rosetta_database". This works on command line, but not in an options file, because the terminal fills in the ~ in the former but never has a chance in the latter.

Thu, 2012-08-30 14:31
smlewis

Apologies for going off the grid for a month; I just started classes up, and took a brief break from this project.

I've finally gotten around to trying D060_Folding.py in PyRosetta, but it appears that I'm raising an exception. My full output is below:

---

In [12]: run D060_Folding.py
PYROSETTA_DATABASE environment variable was set to: C:\Program Files (x86)\PyRos
etta\rosetta_database... using it...
PyRosetta 2.011 [r48543:49680M] retrieved from: https://svn.rosettacommons.org/s
ource/trunk/rosetta/rosetta_source
(C) Copyright Rosetta Commons Member Institutions.
Created in JHU by Sergey Lyskov and PyRosetta Team.

core.init: Mini-Rosetta version 48543:49680M from https://svn.rosettacommons.org
/source/trunk/rosetta/rosetta_source
core.init: command: app -database C:\Program Files (x86)\PyRosetta\rosetta_datab
ase -ex1 -ex2aro -constant_seed
core.init: Constant seed mode, seed=1111111 seed_offset=0 real_seed=1111111
core.init.random: RandomGenerator:init: Normal mode, seed=1111111 RG_type=mt1993
7
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set
. Created 6225 residue types
core.import_pose.import_pose: PDB File:test/data/test_in.pdb not found!

ERROR: Cannot open PDB file "test/data/test_in.pdb"
ERROR:: Exit from: core/import_pose/import_pose.cc line: 184
---------------------------------------------------------------------------
PyRosettaException Traceback (most recent call last)
C:\Python27\lib\site-packages\ipython-0.13-py2.7.egg\IPython\utils\py3compat.pyc
in execfile(fname, glob, loc)
169 else:
170 filename = fname
--> 171 exec compile(scripttext, filename, 'exec') in glob, loc
172 else:
173 def execfile(fname, *where):

C:\Users\Alex Truong\Dropbox\Alex Truong - iPLA2\Rosetta Input Files\D060_Foldin
g.py in ()
435 pdb_filename = options.pdb_filename
436 if pdb_filename: # the default behavior or if you use a PDB file
--> 437 pose_from_pdb(pose, pdb_filename)
438 sequence = pose.sequence()
439 # Fasta file option

C:\Program Files (x86)\PyRosetta\rosetta\__init__.py in exit_callback(self)
1129 class PythonPyExitCallback(utility.PyExitCallback):
1130 def exit_callback(self):
-> 1131 raise PyRosettaException()
1132
1133 def __init__(self): utility.PyExitCallback.__init__(self)

PyRosettaException: PyRosettaException

In [13]:

---

What confused me is the section with the test_in.pdb part, and I'm not sure how that applies to my job. I tried looking through the code in the script, but I didn't see any indication. I ran the script while in the working directory comtaining the fasta file and the 2 fragment files. Could you possibly spot where I went wrong? The example given at the bottom of the script seems to imply that I need to name the files a certain way, but there's no indication (based on the output above) that I even made it that far to raise the error.

Fri, 2012-09-28 12:51
atruong

From what I can tell, "Cannot open PDB file "test/data/test_in.pdb"" is your main error. Do you have a file named test/data/test_in.pdb? Is this the file you want to be opening? (If not, you'll need to replace "test/data/test_in.pdb" where ever it appears in your script with the path and name of the PDB file that you do want to be using.

If "test/data/test_in.pdb" *is* the file that you want to use, and you're sure that it does exist in that location, it might be that the current directory for PyRosetta is different from the base directory to the path "test/data/test_in.pdb". (e.g. if PyRosetta's base directory is /home/me/pyrosetta/project1/test with the file being /home/me/pyrosetta/project1/test/data/test_in.pdb", when PyRosetta says "Cannot open test/data/test_in.pdb" it actually means "Cannot open /home/me/pyrosetta/project1/test/test/data/test_in.pdb" - note the double test.) Try using the full path to the file (e.g. "/home/me/pyrosetta/project1/test/data/test_in.pdb") instead.

Fri, 2012-09-28 18:40
rmoretti

If you are running this as a test, the tests in PyRosetta need to be run from the main PyRosetta directory. So, do 'python test/D060_Folding.py' instead of cding into the test dir.

Sat, 2012-09-29 06:19
jadolfbr

The thing is, I'm not trying to run a test. When I was browsing the code, there was a section that said "All of the default variables and parameters used above are specific to the example with "test_in.pdb"". Now, since my query file is not named "test_in.pdb", do I just go through the script and rename all instances to my file name? Because that file call doesn't seem to be in the actual meat of the script...I'm also not sure whether that test is just a precursor to the actual job run, and the test file will be accessed each time I run the script, regardless of whether it is for testing or for my own file.

It also appears that I'm raising a couple of exceptions...one related to the test file, and one related to mine; the query is in fasta form, but the output took the pdb route as shown.

Mon, 2012-10-01 12:07
atruong

There's a line of code in the script that when exectuted, simulates a real example; I've tried pasting the entire line and editing it to my needs:

python D060_folding.py --fasta_filename pin.fasta --long_frag_filename pin.frag9 --long_frag_length 9 --short_frag_filename pin.frag3 --short_frag_length 3 --jobs 100 --job_output pin_folding_output --kT 1.0 --long_inserts 1 --short_inserts 3 --cycles 200 --disulfides 1

However, when I run the script from the working directory, it says I have a syntax error; what is the error? At first, there is a carat under the disulfides flag, but when I delete the entire thing, the carat stays out in the middle of nowhere.

Mon, 2012-10-01 12:18
atruong

So, I found the line in the code that should be changed. By default, it gets the sequence to fold from a pdb file, defined as 'test/data/demo/test_in.pdb'. If a fasta file is given, the sequence variable is overwritten. Not a good way to do it! This is a bug, and I'll throw it on the bugtracker and make the edits. For now, though, just run the script using 'python test/D060_Folding.py [arguments] from the pyrosetta directory and you shouldn't get an error.

My advice if you want to actually produce something useful (From someone who started in PyRosetta trying to fold proteins and not knowing any python - again coming from a minimal java exposure) - Install Ubuntu on your windows box. Theres an EXE that will setup the dual boot. Use linux. Spend a week getting basic linux commands. Spend the next few days learning more Rosetta. Transitioning into using a linux cluster with Rosetta is a different story, but once you do, it's worth it to get the results. Your sysadmin should be able to help you. And, if you like to program, spend a few days learning python. It's fairly simple, but you can do some pretty awesome things with it!

Mon, 2012-10-01 18:27
jadolfbr

So should I put the input files in the PyRosetta directory as well? From the output given above, it seemed like it was automatically switching directories depending on where I executed the script from. I had just put the script in the same directory as my inputs, and ran it like that. Should I just move everything there? Because I don't want to run Folding.py as a "test", I want to run it with my own files.

Fri, 2012-10-05 12:31
atruong

You can, but as long as you give the flag the full path to your files, it will run.

Mon, 2012-10-08 07:37
jadolfbr

I tried running the script with the full flags (as typed above): when specifying the file names, I used the full path for each file. However, I still get a syntax error with the carat being in a completely unhelpful place. Is there a different way to execute the script?

EDIT: I fixed the syntax error (it was something really stupid with the execution...I had to use "run" instead of "python" since I was already operating in the iPython shell). However, even with the full paths specified, I'm raising the same exceptions and errors with test_in.pdb. Should I go in and change the instances anyway just in case? It appears the flags are not related to the problem I'm having.

Fri, 2012-10-12 12:27
atruong

I don't quite understand why this is happening still. Can you post the exact commands you are using now, as well as 'pwd'. I can send you the fixed file as well that is in the trunk if we can't fix it now.

Sun, 2012-10-14 10:01
jadolfbr

I just cd'd the location of the script, then ran it with the command you can see above. That's it. I'm still getting the same error as pasted earlier.

I could try the new file...if it doesn't work, it's probably something wrong with the input files or something as referenced in the script itself.

Mon, 2012-10-15 09:14
atruong

Like I said in a previous post, you DONT want to cd into the 'test' directory where the script is located. Run the script from the PyRosetta root directory:

Run the script using 'python test/D060_Folding.py [arguments] from the pyrosetta directory and you shouldn't get an error.

Mon, 2012-10-15 09:47
jadolfbr

Okay, it works (I had to drop the disulfides flag because apparently that's not a real parameter...). However, I have...100 pdb files. Is there an optimization step I can take using these output files, similar to AbinitioRelax for Rosetta (which I'm also currently trying)? I know in general, I just pick the output file with the lowest centroid score as given by the script, but is there a more comprehensive way to test all the pdb files?

Mon, 2012-10-15 12:43
atruong

In my experience, centroid or not, the program Calibur is great for clustering the resultant decoys (which is a rosetta name for all your output pdb files). It is here http://sourceforge.net/projects/calibur/

Clustering will attempt to group the decoys into close structural groups. The program will find a 'center' decoy, and many times you can use that, but part of it is observing the structure and going by biochemical intuition. Check out the post-processing section of the Abinitio documentation; should help you get started.

Mon, 2012-10-15 22:04
jadolfbr