Syntax for relax.linuxgccrelease

19 posts / 0 new

Top

Dear friends,
I am trying to use relax.linuxgccrelease to remove any clashes in my PDB file.

Can I ask
1) As I understand from "3) Preparing Your Structure.pdf" in the "introduction" tutorial from Meiler Lab, I should use
relax.linuxgccrelease -database <database> -s <structure> -ex1 -ex2 -relax:sequence -nstruct 100

However, I have been told that:
ERROR: Unused "free" argument specified: -database

So I delete the "-database" option, and run the following, which seems to be fine.
relax.linuxgccrelease -s <structure> -ex1 -ex2 -relax:sequence -nstruct 100

So "-database" is not necessary?

2) I am trying to understand the syntax by using
./relax.linuxgccrelease -help

However, I still could not get useful information from the help file. I think "-nstruct" is the number of models to be constructed. But what is the meaning of "-ex1", "-ex2" and "-relax:sequence"?

More importantly, why do the flags of "-ex1", "-ex2" and "-relax:sequence" not exist in the help file? How actually can I learn to use flags/options from help files?

3) Can I ask is there an on-line database for all the syntax of the executables? Therefore, I do not necessarily need to type "-help" for help.

Thank you very much.

Yours sincerely
Cheng

Category:

Structure prediction

Post Situation:

Unsolved

Wed, 2014-09-24 04:34

lanselibai

Top

1) Rosetta needs to know where the database is, but it has several ways of figuring out where it is.

The first one it looks to is the -database option, either on the commandline or in an options file. If you give the path to the database directory there, Rosetta will use that one and not go looking any further. If -database is not set. Rosetta looks for the $ROSETTA3_DB environment variable. If this is set, Rosetta uses the path specified there. If neither -database or $ROSETTA3_DB is set, Rosetta tries to find the database directory based on the path where the executable is. This assumes you have things laid out in the standard fashion (e.g. main/source/bin/ and main/database ). If that doesn't work, Rosetta will then throw an error.

By the way, the "Unused "free" argument" error normally means that you provided an option that takes a value (e.g. "-database") without specifying the value. (e.g. running "relax.linuxgccrelease -database -s input.pdb -ex1 -ex2 -relax:sequence -nstruct 100")

2) -ex1 and -ex2 are flags to control rotamer expansion when packing. Specifically, they tell Rosetta to diversify the chi1 and chi2 sidechain angles by +/- the standard deviation of the Dunbrack rotamer bin. This slightly-off rotamer sampling helps to pack things better when the exactly-on-rotamer conformation has a slight clash. It actually turns out that with FastRelax they're not all that necessary. The ramping-repulsive-plus-minimization protocol relieves those sort of clashes without the need for extra sampling.

-relax:sequence is a flag to control how relax behaves: "Do a preset, small cycle number FastRelax" <a href="https://www.rosettacommons.org/docs/latest/full-options-list.html#-relax.... -- Though in reality, in recent versions of Rosetta it actually doesn't do anything.

3) The "-help" flag doesn't always give a comprehensive listing of which options can affect a protocol. It only lists those options which the application developer thought to explicitly call out. They might have forgotten one, or might think that some are "standard" and need not be mentioned (e.g. -ex1 and -ex2 apply to many protocols.) I would refer you to the online documentation at https://www.rosettacommons.org/docs/latest/ There's various "standard" options documented at https://www.rosettacommons.org/docs/latest/Rosetta-Basics.html and the (mostly) full option list is available from https://www.rosettacommons.org/docs/latest/full-options-list.html And, of course, always check the documentation for the application you're using (e.g. relax at https://www.rosettacommons.org/docs/latest/relax.html), which normally describes the major options.

I'll also point out the "-out:show_accessed_options" flag. After the run is complete, it will print out all of the options that Rosetta consulted during the course of the run. (These would be all the options which could conceivably affect how the run progresses.) Note that which options are accessed depends on the course of the protocol, so adding one option or changing an input file might cause Rosetta to do a slightly different protocol, exposing a different set of options.

Wed, 2014-09-24 09:40

rmoretti

(Reply to #3)

Top

Hi R Moretti,
Thank you very much for your detailed help. I think I understand all your explanation. As I tested, I think it works fine now. I will leave it to a cluster as it seems to take many hours to construct even one model on my Ubuntu.The document websites you provide are also very useful. I did not realise their importance until I encounter the problem.

Can I ask

1) what is the directory for the output pdb file? It seems that a file of "S_0001.pdb" was created together with score.sc in the present directory.
2) Can I specify it to output the file in the same directory of input file? (It seems that for other executables, the output file is of the same directory as input file)
3)It seems that more specialised relax protocols exist in

https://www.rosettacommons.org/docs/latest/prepare-pdb-for-rosetta-with-...
https://www.rosettacommons.org/docs/latest/preparing-structures.html

Can I simply use the following:
~/Cheng/rosetta_2014.30.57114_bundle/main/source/bin/relax.linuxgccrelease -database ~/Cheng/rosetta_2014.30.57114_bundle/main/database -s /mnt/hgfs/Mutagenesis_Rosetta/clean.pdb -ex1 -ex2 -relax:sequence -out:show_accessed_options -nstruct 100

or do you recommend me to add more options?
(Sorry, it seems to be quite technical on those two links. I just would like some "standard" protocol)

Thank you very much.

Yours sincerely
Cheng

Thu, 2014-09-25 04:35

lanselibai

(Reply to #4)

Top

1) By default Rosetta typically puts files in the directory from which you launched it. There's some options to control that, though. For PDB output with most protocols, you can specify the output path with either -out:path:pdb or with -out:path:all

2) Yes, in most cases that will work, as long as you're sure that the output filenames and the input filenames don't match up. For typical runs with PDB output, that's not a concern, as Rosetta will add a "_####" to the end of the PDB name. If you're doing multiple runs, though, you should be careful about it, because the different files from the different runs might have the same name, and Rosetta will think it's already made a file if it exists from a previous run.

3) The differences in the protocols are due to how extensively you will allow the structure to move. The protocols you linked use always-on heavy-atom coordinate constraints to keep the protein from moving too far from the input coordinates. The command you posted in the first question doesn't have any constraints, so it allows much more protein movement. (In certain instances it will actually cause the protein to fall apart.) Which one to use depends on how close you want the structure to stay to the input structure. If you want it really close, use the more restrictive coordinate constraint protocol. If you want a low energy structure, and you don't care if it moves away from the input coordinates, use the less restrictive protocols.

There's protocols in between the two, too. For example, removing the "-relax:coord_constrain_sidechains" flag will constrain just the backbone, letting the sidechains move. Removing "-relax:ramp_constraints false" will allow the constraints to fall off later in the protocol, which allows you to get lower energies, but limits some of the "falling apart" that can happen with completely free relax.

I'd say that as much as anything is "standard", it's the always-on heavy atom coordinate constrained relax, if for no other reason than there is a paper published about it (http://www.ncbi.nlm.nih.gov/pubmed/23565140).

Thu, 2014-09-25 17:10

rmoretti

(Reply to #5)

Top

Hi R Moretti,
Many thanks for your help.

1) I got that, which confirms my assumption for the output file directory.

2) Now I am aware about the specification for the output file. Thanks for the reminder.

3) Now I have an overview about the constrain options.

Really thank you for telling me your paper. As I understand, I should use harmonic sd=0.5 to refine the structure. So based on Page 4 of the paper, I should put "Harmonic runs added flags" to "Parameter scans" flags, which is:

(I will combine the command line into one single line)

~/Cheng/rosetta_2014.30.57114_bundle/main/source/bin/relax.linuxgccrelease -database ~/Cheng/rosetta_2014.30.57114_bundle/main/database -s /mnt/hgfs/Mutagenesis_Rosetta/clean.pdb -ex1 -ex2 -relax:sequence -out:show_accessed_options -nstruct 100

-no_optH false -flip_HNQ -use_input_sc -correct -no_his_his_pairE -linmem_ig 10 -nblist_autoupdate true

-constrain_relax_to_start_coords -relax:ramp_constraints false -relax:coord_constrain_sidechains -relax:coord_cst_stdev 0.5

The above command is okay to run at least. Can I ask if I interpret your paper correctly?

Thank you very much.

Yours sincerely
Cheng

Fri, 2014-09-26 05:40

lanselibai

(Reply to #6)

Top

That's more-or-less it. Just several note:

* -relax:sequence doesn't do anything, so you can omit it.
* -ex1/-ex2 aren't necessary with the constrained relax protocol. They don't really hurt things, except for slowing things down a bit, though.
* If you're using talaris2013 (which you would be with a weekly release and without -restore_pre_talaris_2013_behavior), you don't want to include "-correct". Everything that was in "-correct" that was worth keeping got rolled into talaris2013. (The paper was written before the talaris2013 switch-over.) I'd definitely omit this flag, though, to make sure things don't get in an inconsistent state.
* Likewise, with talaris2013 the flag -no_his_his_pairE doesn't do anything.
* -relax:coord_cst_stdev 0.5 is the default, so you can possibly omit this, but it certainly doesn't hurt.

Fri, 2014-09-26 13:46

rmoretti

(Reply to #7)

Top

Hi R Moretti,
Thank you very much. Really appreciate your help point out the option issues one by one. So the following is the modified as you said:

-no_optH false -flip_HNQ -use_input_sc -linmem_ig 10 -nblist_autoupdate true

-constrain_relax_to_start_coords -relax:ramp_constraints false -relax:coord_constrain_sidechains

Yours sincerely
Cheng

Fri, 2014-09-26 14:04

lanselibai

(Reply to #8)

Top

Looks fine.

Fri, 2014-09-26 15:52

rmoretti

(Reply to #9)

Top

Hi R Moretti,
In terms of relax.linuxgccrelease. Can I ask

1) How many output structures is sufficient to generate reasonably good representities? It takes one hour to generate only one output. So I have to submit multiple serial job on our cluster. I plan to have "-nstruct 100". Do you think it is sufficient?

2) Do I need "-run:constant_seed" and "-run:jran (PsedoRandomNumber)" for individual job submission? I tested a few jobs and they were all of different trajectaries. So it seems that "-run:jran (PsedoRandomNumber)" is not needed. Is that correct?

Thank you very much.

Yours sincerely
Cheng

Sun, 2014-11-16 02:29

lanselibai

(Reply to #10)

#10

Top

1) It depends on the size of your protein, how heavily constrained it is, and what you're trying to achieve with relax. If you're using the always-on, all-heavy-atom coordinate constraint procedure just to knock off the rough edges from an input PDB, you can probably get away with a single output structure. If you're relaxing protein with a large number of residues, and want to use the output as a highly minimized, low energy structure, you're going to need more.

For most purposes 100 structures from relax are probably going to be enough. Probably more than enough. One way to gauge is to do a score versus rmsd plot to the lowest energy structure. If you're getting a number of structures which are close to the lowest energy structure in both energy and rmsd, you've probably converged in your sampling, and further output structures are just going to resample the structural space you've already examined.

2) Not really. If you don't give -run:constant_seed, Rosetta will pick a random seed from your operating systems entropy pool. This should be different for each run. You only need to specify a seed if you want to control what seed is used for testing or repeatability purposes.

Tue, 2014-11-18 09:22

rmoretti

(Reply to #11)

#11

Top

Hi R Moretti,

1) Thank you. The strucutre I have is of 442 residues derived from homology modelling. The purpose of relax is to find a strucutre that can represent the morphology in wet lab. As we do not have its wet lab data (e.g. X-ray, NMR), I think I would like it to be of low energy and similar to the original homology model input.

I am still using the previous command line based on your paper:

~/Cheng/rosetta_2014.30.57114_bundle/main/source/bin/relax.linuxgccrelease -database ~/Cheng/rosetta_2014.30.57114_bundle/main/database -s /home/lanselibai/Cheng/relax/input/C226S_raw.pdb -out:show_accessed_options -nstruct 100 -no_optH false -flip_HNQ -use_input_sc -linmem_ig 10 -nblist_autoupdate true -constrain_relax_to_start_coords -relax:ramp_constraints false -relax:coord_constrain_sidechains &> /home/lanselibai/Cheng/relax/output/record.log &

After geting 100 decoys, the rms is conducted based on the lowest score decoy (attached). As you can see, most of the deocys are within a very small energy range. The scattering format is also not so ideal. I mentioned similar issue at https://www.rosettacommons.org/node/3813 . And you said (#12) it was "probably safe in taking the lowest energy strucutre". So can I also use the lowest score and ignore the scattering format in this case?

2) Thank you for confirming this. I think it would always be good to add "-run:constant_seed" to make the running repeatable.

Yours sincerely
Cheng

File attachments:

rms.png

Wed, 2014-11-19 02:57

lanselibai

(Reply to #12)

#12

Top

Its perfectly OK that the RMSD vs energy scatter is wonky here. You are starting from a homology model, which isn't perfect to begin with and your RMSDs are to that. It shouldn't be a funnel - most likely it will be the opposite. High RMSDs and low energies. Also, since this is a homology model to begin with, you could also try not constraining to starting coordinates. I may even try the -dualspace option here as well (with the option -nonideal to optimize relax) http://www.ncbi.nlm.nih.gov/pubmed/24265211 .

It really depends on the homology and your template. If your sequence had very high homology to your starting template(s) then coordinate constraints are probably good, otherwise, I would leave them out. Also, don't use coord_constrain_sidechains for your homology model. Crystal structures, yes, that can be good maybe - but definitely not with a homology model. Those sidechains were already packed using some protocol and rotamer library - so allowing them to pack in Rosetta is OK.

You are also missing the flag -ex1 and -ex2 for your command line. For relax, (and pretty much every time you pack) it is probably very important to have both of them. At least add -ex1 and -ex2aro options to your relax run.

Wed, 2014-11-19 09:49

jadolfbr

(Reply to #13)

#13

Top

Hi jadolfbr,
Many thanks for your help. Can I ask based on your individual points?

1) You said it is not perfect to begin with a homology model. So is the X-ray crystal structure the best starting point, or something else?

2) My own protein is of 70% similarity to heavy chain and 90% similarity to the light chain of the template. So in this case, should I include "-constrain_relax_to_start_coords"?

3) I was assuming the rotamer options are default if I did not specify them. As you said, I will add "-ex1 -ex2" as well

The following is the modified command line, do you think it is okay? (I will combine them into one line)

~/Cheng/rosetta_2014.30.57114_bundle/main/source/bin/relax.linuxgccrelease -database ~/Cheng/rosetta_2014.30.57114_bundle/main/database -s /home/lanselibai/Cheng/relax/input/C226S_raw.pdb -out:show_accessed_options -nstruct 100 -no_optH false -flip_HNQ -use_input_sc -linmem_ig 10 -nblist_autoupdate true -relax:ramp_constraints false -relax:coord_constrain_sidechains

-dualspace -nonideal # newly added
-ex1 -ex2 # rotamers
-constrain_relax_to_start_coords # keep or remove?

&> /home/lanselibai/Cheng/relax/output/record.log &

Thank you very much.

Yours sincerely
Cheng

Thu, 2014-11-20 02:50

lanselibai

(Reply to #14)

#14

Top

Ok, lets see. What is your use of the antibody? I would probably do a full relax with and without constraints (just output 1 structure) - see how much each moves and choose whatever one you think would be better for your purposes. Its really not something I could say either way. The antibody framework is highly conserved, so your probably good using constraints. It also depends what you will be using the model for.

Yes, if you have an actual structure that would be best. Xray, neutron scattering, etc. Are you loop modeling the CDRs? How did you make the homology model? Is it from the RosettaAntibody server?

Remove coord_constrain_sidechains. You don't want that for your homology model.

Read the dualspace paper and see if it will help here. Since it is an antibody, it probably will help but not by much. You'll have to decide here after reading the paper, but generally I think its great.

Two other things, maybe you can help me here. Why are you using -no_optH and -flip_HNQ? Also, I don't usually turn off the ramp of constraints when using starting coordinate constraints. I'm not sure which is recommended here for what purposes, so probably either is OK.

Thu, 2014-11-20 08:28

jadolfbr

(Reply to #15)

#15

Top

Hi jadolfbr,
Thank you for your help.

1) I will do a relax with and without constraints. Can I ask do you mean relax "constraints" as "-relax:coord_constrain_sidechains" or something else?

We use it to study its stability. As we have found so far, in terms of the sequence, the constant region is highly conserved compared to the variable region. So probably the framework also has similar.

2) I make it based on the 4KMT.pdb downloaded from the PDB website. The major work is the comparative modelling followed by relax. So do you recommend us try ROSIE?

3) I will remove coord_constrain_sidechains

4) I will read the paper to make the decision.

5) Using "-no_optH -flip_HNQ" and "-relax:ramp_constraints false" is mainly based on Dr R Moretti's paper (see previous posts & http://www.ncbi.nlm.nih.gov/pubmed/23565140). I think it is something to do with the "Harmonic runs added flags".

Yours sincerely
Cheng

Thu, 2014-11-20 11:14

lanselibai

(Reply to #16)

#16

Top

The all atom constrained relax protocol of Nivon et al (http://www.ncbi.nlm.nih.gov/pubmed/23565140) is intended for pre-relaxing of structures prior to use in Rosetta. That is, if you have crystal structures or even structures from MD which you want to bring into the Rosetta scorefunction, but don't want to deviate too far from their starting coordinates. It's not intended for post-relaxing of models after a Rosetta protocol run. So you wouldn't use it to relax models which come out of a Rosetta homology run. There you'd want to use an unconstrained relax to allow Rosetta to find a conformation with a low energy structure. The input coordinates (the output coordinates of the homology modeling run) aren't intrinsically meaningful, so it doesn't make sense to use them as constraints. If you're relaxing output of the homology modeling protocol, remove the "-coord_constrain_sidechains" and even the "-relax:ramp_constraints false" and "-constrain_relax_to_start_coords".

The "-no_optH false" and "-flip_HNQ" are present in the all-atom constrained relax protocol to allow sampling of those degrees of freedom which are typically poorly resolved in X-ray crystal structures. Again, this is for pre-optimization purposes with the constraints. If you do an unconstrained relax, you'll be sampling those degrees of freedom during the relax procedure, so their presence becomes unnecessary. (But they won't hurt if you leave them in.)

Regarding your score-vs-rmsd plot, it's important to look at the scale of the axes. You have little to no movement in rmsd space (< 0.012 Ang Calpha rmsd) and also a very restricted sampling in score space ( all are within 2 REU of each other). This is what I was talking about when I said "converged in sampling". All of your models are structurally and energetically similar, and would be effectively equivalent in downstream applications. (Note that the reason for this convergence is that you've heavily constrained the relax with " -constrain_relax_to_start_coords -relax:ramp_constraints false -coord_constrain_sidechains". Removing them will allow much more flexibility and will expand the range of your relax sampling.)

Regarding -ex1 -ex2, -ex2aro, etc. The -ex1 and -ex2 flags specify to expand the rotamer set by adding rotamers with +/- 1 standard deviation in the first or second chi dihedral (respectively). The -ex1aro and -ex2aro do the same, but only for aromatic residues. (So having both -ex2 and -ex2aro doesn't do anything over just -ex2, as aromatic residues are included in -ex2.)

It actually turns out that adding extra rotamer sub-sampling to relax doesn't improve energies all that much. The multiple cycles of repacking/minimization with the repulsive ramping allow Rosetta to sample the slightly off rotamer sidechain conformations effectively. Adding those flags just slows down relax slightly for marginal benefit. (They are important for other protocols, which use just a few rounds of packing, or which do design, or which don't have ramped repulsives.)

Thu, 2014-11-20 11:17

rmoretti

(Reply to #17)

#17

Top

Hi R Moretti,
Really thank you for your help. Now I have a clearer picture for the option types. Yes, I really do not need those constraints.

I will redo the relax for my homology model with only the "-relax:quick" flag. Can I ask is this a default set to use "-relax:quick"? I see "-relax:fast" is default. So I am thinking if these two are the same thing. But anyway, I will include "-relax:quick".

Yours sincerely
Cheng

Fri, 2014-11-21 10:25

lanselibai

(Reply to #18)

#18

Top

The way Rosetta is currently written, "-relax:quick" should function identically to "-relax:fast". (At one time they might have been different, but they haven't been for quite a while now.)

"-relax:fast" would be the preferred way to write it, though. (Though it is effectively the default.)

Mon, 2014-11-24 15:45

rmoretti

(Reply to #19)

#19

Top

Hi R Moretti,
Thank you. I got that.

Yours sincerely
Cheng

Tue, 2014-12-02 09:46

lanselibai

Search form

You are here

Syntax for relax.linuxgccrelease