You are here



Computational Structure Prediction and Design of Biomolecular Structures

Scientists in this geographically-distributed post-baccalaureate program have the opportunity to participate in research using and developing the Rosetta Commons software. The Rosetta Commons software suite includes algorithms for computational modeling and design of proteins and other biomolecules. It has enabled notable scientific advances in computational biology, including de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes.  

This 1 year post-baccalaureate program is aimed at preparing underrepresented minority and/or disadvantaged students to succeed in PhD programs. 




  • One week of Rosetta Code School (June 7- June 11) where you will learn the inner details of the Rosetta Python code and community coding environment, so you are fully prepared to research using the software.
  • Assignment to a Rosetta lab where you will be mentored by a graduate student and faculty member who will guide and foster your research.
  • Participation in the Summer Rosetta Conference in the gorgeous Cascade Mountains of Washington State (August 10 through August 13) and the Winter Rosetta Conference (location TBD in February 2022), where you will connect with Rosetta developers from around the world.
  • Salary, health benefits, and funding for conference travel are included.
  • Integration into the host institution’s NIH PREP program.



PREP Programs Provides:

  • Research experience: Scholars conduct hypothesis-driven research in their Mentor’s lab, with day-to-day guidance by an experienced PhD student or postdoc. Scholars participate fully in weekly lab meetings, attend weekly research seminars in their department, attend a vibrant PhD program retreat and a national conference of their choice.
  • Community: Scholars come together each month for two-hour ‘Journal Club’ events to present and discuss their research with Peer-Mentors (PhD students, postdocs) and faculty. These meetings include professional development mini-lessons on topics like the NSF-GRFP, graduate school applications, research posters, and more.
  • Project (‘mini-thesis’) meetings: Scholars gain confidence by organizing, preparing for, and convening three one-hour ‘mini-thesis’ meetings with two subject-expert faculty, plus their research mentor and the PREP Director. Scholars benefit both scientifically and professionally by building strong working relationships with multiple faculty members at Johns Hopkins who are experts in their field of interest.
  • Professional training and custom mentoring: Scholars participate in workshops designed to improve their scientific writing skills, and understand ethics in science, and can choose from many other workshops including communication and improvisation. Each Scholar charts an individual development plan with the PREP Director, with custom mentoring both formal (monthly one-hour meetings) and informally as needed.
  • Preparation for GRE or MCAT exam, graduate school applications and interviews.
  • Annual salary plus health, retirement, tuition and other benefits.



  • Individuals from racial and ethnic groups that have been shown by government studies, to be underrepresented in health-related sciences on a national basis.
  • Individuals with disabilities, who are defined as those with a physical or mental impairment that substantially limits one or more major life activities, as described in the Americans with Disabilities Act of 1990, as amended.
  • Individuals from disadvantaged backgrounds
  • U.S.citizens, permanent residents, and U.S. nationals are eligible.
  • Undergraduate major in computer science, engineering, mathematics, chemistry, biology, and/or biophysics
  • While not required, we seek candidates with some combination of experiences in scientific or academic research, C++/Python/*nix/databases, software engineering, object-oriented programming, and/or collaborative development.




  • Resume
  • Unofficial transcript
  • Personal statement that summarizes why you are an appropriate candidate (up to 2000 characters) including:
    • Why this program interests you
    • Brief summary of research and computing experience
    • Research career goals
  • Two recommendation letters, completed recommendations can be sent to
  • Select top three labs and projects of interest from the list below.
  • Deadline for receipt of applications is February 1, 2021.
  • Deadline for receipt of recommendation letters is February 5, 2021.
  • Program contact: Camille Mathis:



Baker Lab @University of Washington in Seattle, WA
“Improving Protein design using deep learning for pandemic preparedness”
Last year we used protein design to develop promising anti-coronavirus therapeutic and diagnostic candidates, but it took a number of months because we had to iterate between computational design and experiment. This project will seek to make this process faster and more robust by incorporating deep learning to reduce the amount of experimental iteration required.

Gray Lab @ Johns Hopkins University in Baltimore, MD
“Antibody engineering by deep learning”
Antibodies are an excellent model system for loop structure prediction and design, a difficult problem in the field. High-resolution models of the loop structure are necessary for successful docking to antigens or for design for improved affinities, yet traditional loop prediction methods have been frustrated on antibody loops because of their extreme variability. In this project, the student will apply deep learning methods, including transfer learning and attention gating to leverage data from a large set of protein structure and focus predictions on the key loop. The PREP trainee will learn antibody engineering, homology modeling and docking, and machine learning. 

Horowitz Lab @ University of Denver in Denver, CO
"Chaperone Nucleic Acids"
It has long been known that nucleic acids carry the genetic information necessary for life. Nucleic acids also play vital structural, catalytic, and regulatory roles in the cell. Very recently, we discovered that nucleic acids perform an additional unsuspected but crucial task—preventing protein aggregation as molecular chaperones. Molecular chaperones are critical for maintaining the health of the proteome (termed proteostasis), which is of prime importance to human health. Defects in proteostasis are linked to many crippling diseases, including Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, and ALS. The work in the Horowitz lab is focused on understanding how nucleic acids act as chaperones, and discovering which nucleic acids are important for these functions in the cell, and which can be developed for treating disease, with research spanning biochemistry, genetics, molecular biology, and biophysics.

Lindert lab @ Ohio State University in Columbus, OH
"Structure Modeling using Mass Spec Data"
Knowledge of protein structure is paramount to the understanding of biological function and for developing new therapeutics. Mass spectrometry experiments which provide some structural information, but not enough to unambiguously assign atomic positions have been developed recently. These methods offer sparse experimental data, which can also be noisy and inaccurate in some instances. We are developing integrative modeling techniques, computational modeling with mass spec data, that enable prediction of protein complex structure from the experimental data.  

Rocklin Lab @ Northwestern University in Chicago, IL
"Applying high-throughput experimental data to guide computational protein design"
Today, most computational protein design tools like Rosetta use the features of natural proteins structures (which amino acids like to be near each other, what types of structures are very common, etc) to guide the design of new proteins. However, for many applications, we want to design proteins with properties far beyond what already exists in nature. To achieve this, we need new sources of data - not just natural protein structures - that can guide design into new territory. Our lab develops new experimental methods to measure properties like folding stability, binding affinity, and dynamics for tens to hundreds of thousands of designed or natural proteins at the same time. We then use these new large datasets to guide protein design proteins. We have a range of different focused on basic science, therapeutic development, and tools for synthetic biology. Each person's project is described on our website ( We will work with an intern or post-bac to find which project in our lab is best for their interests.

Das Lab @ Stanford University in Stanford, CA
"Modeling and designing RNA at high resolution"
This project involves using and improving Rosetta autoDRRAFTER to model RNA coordinates into cryo-EM maps of a range of RNA and RNA-protein machines, including virus segments, mRNA vaccines, and designed RNA nanostructures.

Khare lab @ Rutgers University in New Brunswick, NJ
"Designing stimulus-responsive enzymes for targeted chemotherapy"
Traditional chemotherapy has limited efficacy because chemotherapeutics are toxic to all dividing cells, which limits the dose that can be safely administered. One approach to increase selectivity, called directed enzyme prodrug therapy (DEPT) involves prodrugs, which are site-specifically activated by exogenously delivered enzymes. The prodrug activation reaction is intended to be orthogonal to the human enzymatic repertoire to minimize side-effects. DEPT’s therapeutic benefit in the clinic stands to improve by using a new generation of prodrug-activating enzymes that we are developing using computational design approaches to be “smart”: they can sense and respond to the tumor microenvironment or an external cue (e.g. tissue-penetrant light) in a controllable manner to maximize their site selectivity, and can avoid triggering a strong immune reaction. These developments will enable potent and safer chemotherapy regimens as well as general design methodology to build novel therapeutic switches and biological circuits for a broad range of applications.

Kortemme Lab @ University of California, San Francisco in San Francisco, CA
"Computational design of proteins with tunable and controllable geometries"
Proteins use intricate three-dimensional geometries to carry out diverse functions. We are interested in designing proteins with geometries and functions that do not exist in nature. We build on robotics-based methods and our recent LUCS approach (Pan et al, 2020), and are interested in combining these strategies with recent advances in deep learning methods for protein design. We are also developing approaches to utilize de novo proteins with the same topology but different geometries for engineering new conformational switches.

Huang Lab @ Stanford University in Stanford, CA
"Machine learning methods for structural design"
In order to capture accurate structural dynamics in AI embedded space, data from the protein databank (PDB) is only sufficient for a small number of highly represented protein folds. A strategy to augment the training data by running molecular dynamics (MD) simulations can be explored for a variety of stable protein folds. The MD generated models can further provide conformational statistics and samples for structural embedding. Generative AI models can learn from augmented data and provide a path to fast and highly accurate scaffold generation and optimization for design tasks. We recently developed a structural embedding scheme with variational auto encoder, which allows for the creation of novel (interpolated) structures. When combined with established structure validation metrics, we can produce proteins in the wet lab for characterization through enzymatic assays or biophysical measurements such as circular dichroism, NMR, and x-ray crystallography.

Fischer Lab @ Dana-Farber Cancer Institute & Harvard Medical School in Boston, MA
"Targeting protein degradation for cell therapies"
The cell uses the ubiquitin–proteasome system to degrade proteins that are no longer needed. We seek to repurpose this cellular pathway to specifically degrade proteins of therapeutic interest using compounds. The project will involve modeling protein complexes with these compounds and computational protein design aimed at regulating protein levels in engineered cell therapies.