You are here

De novo Protein Design

6 posts / 0 new
Last post
De novo Protein Design

Hello All,


I am trying to learn how to preform De novo Protein Design.


I looked for tutorials and demos but did not find any. Can someone point me to the location or material where i can read about how to preform this protcol? just something basic so i can understand where to start from.

Post Situation: 
Sat, 2017-07-15 10:39

De novo protein design (that is, design a sequence from scratch which will create a given fold) is probably one of the more complicated protocols in Rosetta. That's one of the reasons why there aren't any tutorials for it - its still a bit more art than science, and it's rather dependent on the type of protein you're interested in creating. -- There really isn't a single recommended protocol which can be applied.

The best you're likely to get is to read the papers about de novo protein design, particularly the Koga paper: Other papers of interest:,,,, and There's additional papers around if you're interested in repeat proteins, or things like helical bundles.

As you're interested in the methodology, be sure to take a close look at the Supplemental Materials for the papers, as there is sometimes/often an extended protocol listed there, which gives more details about how the protocol was carried out.

Mon, 2017-07-17 09:20

Well it explains why i can't find protocols. I thought it was one of the standard rosetta protocols (like abinitio).


I read the koga paper, i tried to use their supplimentary material to repeat (and hence understand) their work, but it did not work. I tried to generate 1 decoy, but it seems to enter an infinite loop (even after trying the protocol with an older resetta build and leaving it over night).


from my research during the weekend, i understand that there are 2 steps to the process, backbone building then side chain design (i understand side chain design and even have my own scripts that does it just fine), but i do not understand how to do back bone building. I got the understading that i need blue print file, but then........?


From the koga paper, i see that they have built 2 blue print files (1 for each half of the protein) why no just 1 file?

also they have a remodel resfile, i did not undrstand how the two are connected, and how to make the choices within the resfile (some positions are AUTO others are NOTAA otehrs are PIKAA).


I understand the kiga paper is talking about the loop leangths and the corresponding secondary structure chirality, but i am just trying to see how to start, just the skeleton of the protocol even if it is not ideal, but just to start and see if i can refine it later. my online research did not show a protocol i can follow, (the marcos paper lost me when they were talking about changing the score function weights, and they did not show how to the backbone design, at least i did not understand how they did it), Huang's paper does not have supplimentary data, the other i have not read yet.

Mon, 2017-07-17 09:40

I don't know for certain, but my guess is that attempting to construct the entire protein in one go had just too many degrees of freedom in order to sample properly. (The random fragment insertions spread across the whole protein mean that it's difficult to find local interactions which stablilize the partial folded protein - you'd just get a tangle.) So to make it a bit more tractable, they split the problem in half, building the first half of the protein in one step, and then building the second half of the protein in the second step. 

The resfile is used during the design is to control those particular locations. Again, I'm not certain, but I would say that those positions are known to be important for the particular fold, or need to be particular identities for the downstream experimental processing. Moreover, it looks like most of the specification are actually `NOTAA FILVWY`, which is saying not to place hydrophobic residues at those positions. These are likely to be surface-exposed locations, and forcing them polar (or rather, not large hydrophobics) means that there will be less issues with aggregation of the proteins. (Rosetta design doesn't explicitly consider aggregated states, so you often have to add in that information in other ways.) The `PIKAA DERK`, restricting things to charged residues, is an even stronger version of this, practically requiring that this residue is forced to be solvent exposed.

Another thing to consider is that this protocol is the result of extensive iteration between design and then checking how the designs behaved with forward folding (ab initio runs). Over that process, they got a sense of what the likely trouble spots are for Rosetta design (with respect to Rosetta forward folding) and adjusted the design process to address those particular problem areas.  (As mentioned, the process is still a bit of an art.)

For the Marcos paper, the issue is that the (then current) default scorefunction didn't prioritize the particular features they were interested, so they started (through trial and error) to tweak the scorefunction to see if they could figure out how to get what they wanted.


That's a general feature of these sorts of protocols -- the end result is actually the result of a rather long process of trial and error, where test protocols are tried, the output examined, and the protocols iteratively tweaked until the results line up with what the designers think things "should" look like. (Or at least until the results are no longer obviously bad.) Once you transition to a new system you're going to have to repeat the process to some extent. You can start with a general protocol, but you're likely going to have to interate several times with slightly different protocols, tweaking in order to fit the protocol to your particular system. (Again, art.)

My recommendation as a first step is to take a particular example (the Koga paper is a good one - though you might want to try several and see if some other one "clicks" for you better) and then just run through the protocol as-is, attemting to replicate exactly what the published paper did. You'll want to make sure you can repeat that and understand what's going on there before you start tweaking things for your particular system.  If you have particular questions, or are running into issue getting things to run, please feel free to ask here.


P.S. The Huang paper is a review paper - it doesn't present a method itself (hence no Sup Mat), but summarizes some of the developments in the field of de novo protein design.



Tue, 2017-07-18 16:42

Hi rmoretti,

Thank you for the extensive explanation. At least I know that what I am trying to do is fundimentally still difficult and without a staderdised protocol. Trial and error is extreamly stressfull.

I have read the papers and some others, I realise that de novo designing beta sheets are much more difficult than helices, since the hydrogen bonds are at distant locations within the topology rather than local as in helices, which is why (from my liturature search so far) most people are taking on the challange to de novo design topologies with sheets.

I am concentrating my efforts on de novo desining helices only for now, since they should be easier, first to get a foundation on the concept, and also to get a script going to get some and topologies and run some downstream tests.

I have written a small PyRosetta script that uses the BluePrint Builder mover. it seems that i get nice backbone (only helices) topologies, i get different topologies by generating an all helix blueprint file, then inserting loops at random positions.

I then sequence design my backbone by SASA layer (core, then boundery, then surface) with packstat as a filter, and i get what seems like good compact structures (sometimes i get large voids within the proteins but these structures are discarded).

My problem is that the abinitio folding gives me very bad results, some decoys are around 12A RMSD and lower than the relaxed designed structure!! (what could be causing this? my de novo design? or my sequence design?)

I tested my sequence design script with natural proteins from RCSB, and i get an "OK" funnel curve. So i think i coded it correctly.


Is there a paper you know that discusses only helix de novo design? I want to know how it is scripted, what movers they use, even if it is in Rosetta script i can still attempt to translate it to PyRosetta.


Thank you for your patients.


P.S: I am went through the Marco Paper and its supplimentary documents, but they did not supply their input files (4x blue print files, i think also 4x .pdb files and a .cst constraint file), I am thinking of emailing the corresponding author (i think it is david himself) to request them so i can replicate their work.

Sun, 2017-08-06 01:01

If you're interested in helical bundles, there are a number of papers from the Baker lab which discuss parametric design of alpha helical bundles: among others. If you're interested in all-alpha proteins more generally (e.g. things like the globin fold), then the previous papers are the ones you may want to consult.

But your general sense is correct - if you're interested in the nitty-gritty tips and tricks on getting things to work for another system, you'll often have to talk to the paper's authors to get the "inside scoop" on how best to run the protocol. (And if there are any improvements in the works which you might want to take advantage of.)


Regardings emailing the corresponding author, feel free to do so, but you should be aware that David Baker gets tons of emails. It's likely he'll just forward your email to a grad student/postdoc to be answered. You may want to spare him the hassle and address the relevant person directly. (See for contact information.)  Even if the first author isn't a lab member anymore, often the second/third/etc. authors are working on related projects and could help you (or at least direct you to the lab member who is the person you should ask.)

Mon, 2017-08-14 09:06