Hello, I am using the following protocol, modified from the design raf-rac interface demo, to design a protien protein interface. I ran 2 designs in parallel using nstruct = 1,000 and ran SequenceProfile.py to analyze and found that the results were the same for each. Therefore I wondering exactly how the sequence space is "randomly" searched to identify mutations and how likely these mutations will be to give a more favorable interface in vitro. Thanks in advance
<Ddg name=ddg_binding threshold=0 scorefxn=REF2015 repeats=3 repack=1/>
DDG filter computes binding score for the complex
(threshold=0 only allows complexes with negative binding score)
(repeats=3 calculates binding score three times and returns average)
(repack=1 repack complex in both bound and unbound states to calculate binding score)
<Sasa name=sasa threshold=800/>
SASA filter computes interface solvent-accessible surface area
(threshold=800 only allows complexes with greater than 800 Å^2
as per Janin et al., Quarterly Reviews of Biophysics, 2008)
<RestrictResiduesToRepacking name=nodesign residues=/>
<ProteinInterfaceDesign name=design repack_chain1=1 repack_chain2=1 design_chain1=0 design_chain2=1 interface_distance_cutoff=8/>
Reads command line options
<FavorNativeResidue name=Fav bonus=1/>
<Docking name=dock1 fullatom=1 local_refine=1/>
Runs local refinement stage of full atom docking
<PackRotamersMover name=packrot scorefxn=REF2015 task_operations=design,cmdline/>
Runs protein interface design
<MinMover name=min scorefxn=REF2015 chi=1 bb=1 type=dfpmin_armijo_nonmonotone tolerance=0.01/>
Runs full atom side-chain and backbone minimization
Runs movers and filters in this order
The PackRotamersMover uses Metropolis Monte Carlo Simulated Annealing to optimize sidechain conformations/identities.
Roughly speaking, the packer starts with a (randomized) structure. It evaluates the energy. It then picks a random position, and picks a random sidechain conformation from amoung the set of all conformations for all allowed amino acids at that position. It then evaluates what the energy would be at that position. If it's better, it replaces the old sidechain at that position with the new one, and repeats the process. If it's worse energy, it uses the Metropolis criterion to evaluate how much worse it is. If it's not too much worse, it can randomly pick that conformation, which helps keep the algorithm from getting stuck, but if it's a lot worse, it rejects the substition and repeats the process with the original structure. It then repeats this process for many, many cycles to optimize the conformation.
I've omitted a lot of the details, like how many cycles it chooses (which is based on how many positions/rotamers it has), and the fact that the "temperature" for the Metropolis criteron is ramped during the protocol to make it more likely to reject bad substitutions at the end of the run, and the various tweaks on how things work to make things more efficient.
Regarding the randomization, the packer isn't intrinsically trying to get a diverse ensemble of designs. Each packer run (each output) it's doing the exact same optimization technique, and if it was perfect in doing so, it would come up with the same result every time. The reason why the packer does give stochastic results is because the optimization is a hard problem, and the MMCSA method being used to do it efficiently isn't guranteed to give perfectly optimal results (though it does frequently get close).
This can change based on the problem, though. If you have a particularly restrictive sampling problem (e.g. you have a small set of positions, with a small set of rotamers, particularly if they're closely packed) the best option can be pretty obvious to find, and you'll end up with just a single result, despite running the design thousands of times. If you do want diversity in your design results, you'll have to alter your protocol a bit. If you only have a few position/mutation combinations, you can try doing saturation scans, where you force in identities and rank them by the energy you get back.
Another approach that works is to take advantage of the fact that the minimum is sometimes highly sensitive to the backbone conformation. If you pre-generate an ensemble of different backbones (for example by using the relax or backrub applications), then you can do a small set of design runs on each of the different backbones. While for any one starting backbone you might have consistent design results, when you looks over a set of different but very similar backbone conformations, you should see a greater variety of protein sequences.