I'm been fairly successful at creating a large list of decoys for several of my project (de novo, loop modeling, etc). My runs typically produce 30,000 to 50,000 decoy structures, which I am under the impression is a decent number.
What is the best method to determining the "most accurate" structure from these decoys? Do I cluster first, take the lowest energy (c.0.0.pdb). Or do I look at the top scoring structures from each cluster? (c.0.0.pdb, c.1.0.pdb, etc). Or do I score the entire set and take the lowest scoring structure to be the most accurate, without clustering?
Any advice would be great. I've tried a few of these options, and it just gets confusing when several decoy structures have similar scores, but very different structures.