Back to HBNet page. Back to Mover page. Back to RosettaScripts page.

PMID: 27151862 DOI: 10.1126/science.aad8865 http://science.sciencemag.org/content/352/6286/680

HBNet consists of three steps, described in detail below: first, an exhaustive but efficient search identifies the hydrogen bond networks possible within a given design space (which consists of all allowed sidechain rotamers of all amino acid types being considered for a particular backbone conformation); second, networks are scored and ranked based on the Rosetta energy function, satisfaction (all buried polar atoms participating in hydrogen bonds), and user-defined options; third, the best networks, or combinations of the best networks, are iteratively placed onto the design scaffold and held in relative position with constraints that serve as ‘seeds’ for any subsequent Rosetta method to design around the network and optimize rotamers for the remaining positions in the scaffold.

Step 1. Exhaustive search to identify all possible hydrogen bond networks in the given design space (Fig. 1A-B).

HBNet makes use of Rosetta’s Interaction Graph (IG) data structure, initially populating it with only the sidechain hydrogen bond and Lennard-Jones (steric repulsive) energy terms. The nodes of the graph are the residue positions of all designable or packable residues, and the edges represent putative interactions between those residues, pointing to sparse matrices that store the two-body energies between all pairs of interacting rotamers (of all amino acid types being considered) at those two positions. Only using the hydrogen bond and repulsive energies allows for instant look-up of all rotamer pairs with favorable (low energy) hydrogen bond geometry and no steric clashing. Standard combinatorial protein design algorithms use Monte Carlo or similar randomized methods to search this rotamer interaction space; instead, HBNet samples the entire space through recursive depth-first search of the interaction graph, enumerating all compatible, non-clashing connectivities of hydrogen bonded sidechain rotamers.

Because we are searching the entire graph, there is no substantial advantage to depth-first versus breadth-first search. And because we have to traverse not only the nodes of the graph, but matrices pointed to by each edge (multiple rotamers per each node, and multiple pairs of rotamers for each edge), standard graph traversal algorithms cannot be implemented in a conventional manner; put another way, it is not just connected nodes (residues positions) that constitute our networks, but hydrogen bonds between atoms of particular rotamers at each node – this latter criterion requires additional steps and behavior: Each time a new hydrogen bonding rotamer is considered, it has to be checked to ensure it does not clash with any existing rotamers in that network. If it is accepted, a recursive call is made on this rotamer. These recursive calls continue until a stop condition is reached: either no additional hydrogen bonding interactions can be found, or the network connects back to one of the original starting residues. Some polar amino acids, such as Asn and Gln, can make three or more hydrogen bonds, serving as branch points in hydrogen bond networks; depth-first search misses these branching amino acids, and to account for this, a look-back function identifies networks that share one or more identical rotamers and, after checking for clashes or conflicting residues, merges them together into complete networks. Redundant networks are eliminated.

For this work, a special instance of HBNet, “HBNetStapleInterface”, was written, in which graph traversals are initiated at residue positions at the intermolecular interface. This offers two advantages: first, starting the traversal at only the interface positions reduces the search space, speeding up runtime, and second, it ensures only networks at the interface are found, which was the goal of the approach in this study; requiring that at least 2 residues in each network come from different polypeptide chains ensure that network spans the intermolecular interface. For each starting residue, HBNetStapleInterface iterates through each edge; at each edge, networks are initiated for rotamer pairs with interaction energies less than a threshold value. Because the interaction energy only consists of hydrogen bonding and repulsive contributions, a positive energy indicates clashing, and a negative energy indicates hydrogen bonding; setting a threshold allows for both selection of hydrogen bonds with favorable (low energy) geometry and faster computational runtime – because of the multiple recursive steps, runtime is exponential dependent upon the number of hydrogen bonding rotamer pairs (which increases as the threshold is made less stringent). The total number of hydrogen bonding rotamer pairs differs vastly between input structures and cannot be calculated ahead of time; through extensive empirical testing, we found that threshold values ranging from -0.65 to - 0.85 resulted in favorable hydrogen bonds and runtimes on the order of ~0.2-10 minutes for complete design runs that included downstream design of numerous network possibilities for a given input structure.

Step 2. Score and rank all of the H-bond networks.

Once all possible networks are identified, they are scored and ranked to identify the “best” networks. For each network, buried polar atoms are identified by solvent-accessible surface area (SASA); networks with buried heavy atom donors or acceptors not making hydrogen bonds (unsatisfied) are eliminated. The remaining networks are then ranked based on the least number of unsatisfied polar hydrogens. The networks are then scored against each other in the context of a background reference structure: all designable or packable positions in the scaffold are mutated to poly-alanine, network rotamers placed onto the scaffold, and the network scored with the full Rosetta energy function (talaris2013).

Consideration of sidechain-backbone hydrogen bonds. During Step 1, sidechain- backbone hydrogen bonds are not explicitly considered because the backbone is fixed (the number of sidechain-backbone hydrogen bonds for any given rotamer is constant). During step 2, sidechain-backbone hydrogen bonds are scored when the networks are placed onto the reference structure, and are therefore included in evaluation for satisfaction (how many of the buried polar atoms participate in hydrogen bonds). Thus, even though they are not searched for explicitly, HBNet captures networks with sidechain-backbone hydrogen bonds. Networks with additional hydrogen bonds to backbone polar atoms will generally score better than a similar network without h-bonds to backbone in that the connectivity and satisfaction is improved.

Step 3. For each of the best-scoring H-bond networks, perform design.

The best networks as ranked by Step 2 are iteratively placed onto the input scaffold and passed back to the RosettaScripts protocol and for user-defined design of the remaining residue positions. Atom-pair constraints are automatically turned on for each pair of atoms making a hydrogen bonds in the network; these constraints are tracked throughout the remainder of the design run to ensure the network residues are fixed in relative position during the downstream design. HBNet also outputs a Rosetta constraint (.cst) file that can be used to specify the same constraints in subsequent Rosetta design runs. It should be noted that these atom-pair “constraints” in Rosetta nomenclature are really “restraints” in that the rotamers are allowed to move, and an energy penalty is applied if the constraint is broken (i.e. if the hydrogen bond is broken). This approach – as opposed to simply fixing the coordinates of the network atoms – allows small movements of the network rotamers, allowing for a larger number of solutions for packing additional rotamers around the network. A trend that emerged in the work we present here is that tight packing around the networks, as well as satisfaction of all buried heavy-atom donors and acceptors, is paramount to design success; it is more important to have hydrogen bonds satisfying all polar atoms in the network with mediocre h-bond geometry than it is to have ideal h-bond geometry but poor packing around them and/or unsatisfied donors/acceptors.

Combinations of multiple networks at the same interface can also be considered and specified by the user. Unlike typical Rosetta design, in which one input structure yields one output structure (the lowest energy solution found by sequence design and combinatorial sidechain optimization), this approach allows for hundreds of design possibilities to be output for each input structure.