Hi, I have some questions on protein design using Rosetta, specifically on the coverage of sequence space.
Say, a protein with 100 aa, allow the design of all positions with all the 20 aa residue species, to optimize the score to improve the stability of the protein.
I want to know
1) If Monte Carlo is not used for iterative design, then can I say the sequences generated in the output are ALL the sequences Rosetta has tested?
2) If Monte Carlo is used, by any means could I retrieve what designs (.pdb and score) has Rosetta discarded during MC minimization?
3) Is there a way to prevent Rosetta from searching already-searched sequence space (i.e. stop Rosetta from generating redundant sequences)?
4) How does Rosetta generate the design sequence at the very beginning for minimization and scoring exactly? Does it start with random mutation of the sequence?
I know that people have been using the low energy sequences from Rosetta design to compare with the sequence space observed naturally from evolution.
However, from my experience of using backrub to generate designs, I have got a lot of redundant sequences (i.e. only 1000 sequences are non-redundant out of 50k of designs). I wondered how the comparison would be valid if the search of sequence space using Rosetta is not exhaustive (maybe I'm wrong). That's why I want to know how exactly the design is started, and how does Rosetta process the designs afterward.