Rosetta Commons Research Experience for Undergraduates
A Cyberlinked Program in Computational Biomolecular Structure & Design
Interns in this geographically-distributed REU program have the opportunity to participate in research using the Rosetta Commons software. The Rosetta Commons software suite includes algorithms for computational modeling and analysis of protein structures. It has enabled notable scientific advances in computational biology, including de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes.
Due to the COVID-19 global pandemic, the summer 2020 program was administered in a virtual format. While we hope this is not the case, we can continue the virtual format for summer of 2021, if needed.
Summer 2021 decisions will be released by March 1.
- One week of Rosetta Code School (June 7 through June 11) where you will learn the inner details of the RosettaPython code and community coding environment, so you are fully prepared for the summer!
- 8 weeks of hands-on research in a molecular modeling and design laboratory, developing new algorithms and discovering new science.
- The summer will finish with a trip to the Rosetta Conference in the gorgeous Cascade Mountains of Washington State, where you will present your research in a poster and connect with Rosetta developers from around the world. The conference will be held from August 10 through August 13.
- This program is supported by NSF (Award 1950697). Interns will receive housing, paid travel expenses, and a $6,000 stipend.
Include the following in the application:
- Personal statement - why this internship interests you - brief summary of research and computing experience - why you are an appropriate candidate for the internship (up to 500 words)
- Two references (complete the reference forms, in the application, with contact information)
- Select top five labs and projects of interest from the list below.
- Deadline for receipt of applications is February 1, 2021.
- Deadline for receipt of recommendation letters is February 3, 2021.
- Program contact: Camille Mathis: firstname.lastname@example.org.
- U.S. citizens, permanent residents, U.S. nationals, AND international students are eligible
- College Sophomores or Juniors preferred
- Major in computer science, engineering, mathematics, chemistry, biology, and/or biophysics
- Available for at least 10 weeks during the summer of 2020
- Interest in graduate school
- While not required, we seek candidates with some combination of experiences in scientific or academic research, C++/Python/*nix/databases, software engineering, object-oriented programming, and/or collaborative development (git)
- **Students graduating before the start of the program are not eligible for the REU and are encouraged to apply to our post-bac program.
Available projects and locations:
Baker Lab @ University of Washington in Seattle, WA
“ Designing new protein switches for synthetic biology applications”
Controlling when and where proteins function is highly desirable for studying and treating human diseases, developing sensitive diagnostics, improving agriculture and food production, reimagining clean energy, and more. Molecular “switches” provide a powerful means to control proteins by turning protein function on and off at will. Using Rosetta, protein switches have been recently designed to control protein localization and degradation, create sensitive biosensors for COVID-19 detection, and target T cells to specific tumor cell types. We are currently working to improve and diversify the functions and properties of designer protein switches. Rosetta interns will learn how to use computational protein design to develop new and improved protein switches for their choice of synthetic biology applications.
Cheng Group @ Merck & Co. in San Francisco, CA
In antibody drug discovery, we often seek to improve antigen binding while reducing antibody self-interaction, and modeling is useful in prioritizing engineering efforts. We have recently generated sizable datasets around antigen binding and developability attributes, and are leveraging these to investigate and develop predictive models using structure-based and machine learning approaches. The student will develop workflows to assess the use of Rosetta-generated antibody conformational ensembles in modeling endpoints related to antigen binding and self-interaction, and will work with Merck & Co. scientists to assess the advantages of derived predictions alone and in combination with state-of-the-art predictive approaches.
Cooper Lab @ Northeastern University in Boston, MA
“ Crowdsourcing protein folding and design”
We are exploring how citizen science and crowdsourcing through video games can help biochemists with their work. To do this, we have developed the game Foldit, a multiplayer online game that allows players without previous experience in biochemistry to work on protein folding and design problems. This project will focus on development of game-related aspects to understand and improve the player experience. Potential projects include virtual reality and dynamic difficulty adjustment.
Correia Lab @ École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland
" Deciphering protein surface fingerprints for functional prediction and design"
Predicting interactions between proteins and other biomolecules solely based on structure remains a challenge in biology. A high-level representation of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein’s modes of interactions with other biomolecules. We hypothesize that proteins participating in similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF (molecular surface interaction fingerprinting), a conceptual framework based on a geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein– protein interaction site prediction and ultrafast scanning of protein surfaces for prediction of protein–protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.
Das Lab @ Stanford University in Stanford, CA
" Modeling and designing RNA at high resolution"
Project involves using and improving Rosetta autoDRRAFTER to model RNA coordinates into cryo-EM maps of a range of RNA and RNA-protein machines, including virus segments, mRNA vaccines, and designed RNA nanostructures.
Furman Lab @ Hebrew University in Jerusalem, Israel
" How do post-translational modifications change the communication of a protein with its partners?"
Interactions mediated by short linear motifs, or peptides, play important roles in biological regulation. In this project you will use and further adopt FlexPepDock and tail modeling protocols to investigate how post-translational modification affects the organization, interaction and function of selected proteins involves in signal transduction. This analysis will be the basis of targeted mutations, which will subsequently be tested in the lab after the summer.
Gray Lab @ Johns Hopkins University in Baltimore, MD
“Antibody engineering by deep learning”
Antibodies are critically important for our immune systems and as drugs with high target specificity. Our lab has developed new deep learning-based approaches to predict the structure of antibody loops, and we are now poised to use these networks to design new antibody molecules. For example, we would like to be able to design antibodies quickly for emerging pandemics like coronavirus, even as structural information on a new pandemic may be limited. Good designs will require high-resolution models of the antigen and the antibody loops and an effective design algorithm. In this project, you will learn about deep learning methods (including generative models), antibody structure and therapeutics, and scientific model validation.
Gront Lab @ University of Warsaw in Warsaw, Poland
" Rosetta on the Web: Homology Modelling and other applications"
The use of Rosetta software can be intimidating for inexperienced users due to the size of the package and its level of complexity. A possible way to reach wider audience is to provide an interactive web-based user interface. In this project you turn a one of Rosetta protocols into a fully working Python web application. You will then apply it to analyse and visualise your research results! This will be a learning experience of Rosetta modelling but also help you improve your Pythons skills on any level.
Horowitz Lab@ University of Denver in Denver, CO
“Developing Educational Biochemistry Puzzles in Foldit”
Foldit is a biochemistry video game in which hundreds or thousands of players attempt to solve difficult biochemistry puzzles that are difficult for computational algorithms. Additionally, Foldit is frequently used as an educational tool for science due to its unique interactive character. This project is to develop educational tools for Foldit.
Hosseinzadeh Lab @ University of Oregon in Euegen, OR
"Computational design of modular biosensors for rapid diagnosis"
As COVID-19 continues to change everyday life as we know it, we are reminded, once again, of the importance of early diagnostics that are rapid, robust, and more importantly, easily accessible to everyone across the globe. Despite availability of PCR-based tests for COVID-19 detection from early days, the cost of this method and the requirement for a specialize equipment and a technician to perform it, adversely affected our ability to control the pandemic worldwide. This inaccessibility has disproportionately affected marginalized groups in the United States, particularly Native Americans, and Black and Latinx communities, and also impacted citizens globally in less developed countries. We believe designer proteins and peptides give us the opportunity to address this challenge by providing a stable sensing platform. Rosetta interns will learn how to computationally design, and test in lab, protein and peptides that can bind to a target of interset.
Huang Lab @ Stanford University in Stanford, CA
" Machine learning methods for structural design"
In order to capture accurate structural dynamics in AI embedded space, data from the protein databank (PDB) is only sufficient for a small number of highly represented protein folds. A strategy to augment the training data by running molecular dynamics (MD) simulations can be explored for a variety of stable protein folds. The MD generated models can further provide conformational statistics and samples for structural embedding. Generative AI models can learn from augmented data and provide a path to fast and highly accurate scaffold generation and optimization for design tasks. We recently developed a structural embedding scheme with variational auto encoder, which allows for the creation of novel (interpolated) structures. When combined with established structure validation metrics, we can produce proteins in the wet lab for characterization through enzymatic assays or biophysical measurements such as circular dichroism, NMR, and x-ray crystallography.
Jha Lab @ Los Alamos National Laboratory in Las Alamos, NM
" Protein design for small molecule sensing"
We are intersted in chimeric proteins that can bind to small molecule and produce signal in real-time with applications in diagnostics and synthetic metabolic pathway evaluation.
Karanicolas lab @ Fox Chase Cancer Center in Philadelphia, PA
“ Designing small-molecules that break down cancer-driving proteins”
PROTACs are designed compounds that bind to E3 ligases (the "garbage collectors" of the cell), and re-direct these E3 ligases to degrade new cellular targets. In this project, you will use Rosetta to design new PROTACs meant to degrade proteins responsible for tumor formation, maintenance, and metastasis. We expect that successful PROTACs will lead to degradation of these key cancer-drivers, and thus will represent potential starting points for drug discovery.
“ Design of novel transposases for scarless DNA-insertion using Rosetta ”
Transposases are macromolecular machines capable of both DNA-cleavage and joining activity, therefore have the potential to function as sophisticated genome-engineering tools. In fact, DNA-transposases are already in clinical gene therapy trials to genetically modify T-cells. However, transposases typically bring along their DNA signatures, which makes scarless insertion impossible. Using Rosetta and a in vivo selection system, we will design scarless transposases capable of delivering precise DNA-sequences.
Khare lab @ Rutgers University in New Brunswick, NJ
" Designing stimulus-responsive enzymes for targeted chemotherapy"
Traditional chemotherapy has limited efficacy because chemotherapeutics are toxic to all dividing cells, which limits the dose that can be safely administered. One approach to increase selectivity, called directed enzyme prodrug therapy (DEPT) involves prodrugs, which are site-specifically activated by exogenously delivered enzymes. The prodrug activation reaction is intended to be orthogonal to the human enzymatic repertoire to minimize side-effects. DEPT’s therapeutic benefit in the clinic stands to improve by using a new generation of prodrug-activating enzymes that we are developing using computational design approaches to be “smart”: they can sense and respond to the tumor microenvironment or an external cue (e.g. tissue-penetrant light) in a controllable manner to maximize their site selectivity, and can avoid triggering a strong immune reaction. These developments will enable potent and safer chemotherapy regimens as well as general design methodology to build novel therapeutic switches and biological circuits for a broad range of applications.
Kortemme Lab @ University of California, San Francisco, in San Francisco, CA
“ Computational design of proteins with tunable and controllable geometries”
Proteins use intricate three-dimensional geometries to carry out diverse functions. We are interested in designing proteins with geometries and functions that do not exist in nature. We build on robotics-based methods and our recent LUCS approach (Pan et al, 2020), and are interested in combining these strategies with recent advances in deep learning methods for protein design. We are also developing approaches to utilize de novo proteins with the same topology but different geometries for engineering new conformational switches.
Kuhlman Lab @ University of North Carolina, Chapel Hill in Chapel Hill, NC
" Design of Dengue Virus Subunit Vaccines"
Dengue virus (DENV) vaccine development has been challenging because of the presence of 4 serotypes (DENV1-4) and the potential for vaccine enhanced severe disease. The leading live attenuated tetravalent DENV vaccines have been plagued by poorly balanced replication of each vaccine component leading to variable efficacy and vaccine primed severe dengue disease in some children. The goal of our research is to develop novel recombinant protein and virus like particle (VLP) vaccines that overcome barriers faced by live attenuated tetravalent vaccines. We have discovered that the DENV E protein produced as a secreted protein is a poor subunit vaccine because it is a monomer that does not display major quaternary epitopes targeted by human neutralizing and protective Abs. We are now using Rosetta to identify mutations that stabilize the E protein so that it will elicit more potent neutralizing antibodies while lowering the elicitation of disease enhancing antibodies. Rosetta interns will learn how to perform these simulations and will develop new computational protocols for eliminating unwanted epitopes from the surface of the viral protein.
Kulp Lab @ The Wistar Institute in Philadephia, PA
" Design of immune modulating proteins targeting infectious diseases and cancers"
The induction of antibodies by vaccination to provide protective immunity is extremely well established. Functional antibodies, such as neutralizing antibodies (nAbs), are a correlate of protection in vaccines. Our experience with viruses such as Influenza, HIV and MERS tells us that there are common functional epitopes that are sites of vulnerability. Viruses have methods to hide such epitopes from the humoral immune system. However, we have developed advanced structural engineering methods to expose and to enhance immune recognition of the Achilles’ heels of viruses. We have systematically modeled and studied the SARS-CoV-2 Spike protein structure to help identify potential targets. We would now like to engineer self-assembling nanoparticle vaccines decorated with SARS-CoV-2 epitopes. We collaborated to push one of the first SARS-CoV-2 vaccines into the clinic, 10 weeks after we obtained the seqeuence. There are numerous exciting projects around evaulating clinical samples from our human vaccine studies as well.
Lindert Lab @ Ohio State University in Columbus, OH
" Structure Modeling using Mass Spec Data"
Knowledge of protein structure is paramount to the understanding of biological function and for developing new therapeutics. Mass spectrometry experiments which provide some structural information, but not enough to unambiguously assign atomic positions have been developed recently. These methods offer sparse experimental data, which can also be noisy and inaccurate in some instances. We are developing integrative modeling techniques, computational modeling with mass spec data, that enable prediction of protein complex structure from the experimental data.
Meiler Lab @ Vanderbilt University in Nashville, TN
" Personalized structural biology - which impact do insertion and deletion mutations have on protein structure and folding?"
Huge reductions in the cost of DNA sequencing have enabled clinicians to obtain entire genome sequences of patients. This should allow for more personalized diagnosis and treatment for patients. However, due to the vast amount of variation in human genomes, it is often difficult to go from a patient’s genome to actionable clinical decisions. The Personalized Structural Biology Program at Vanderbilt investigates observed individual changes in protein sequences for their impact on protein folding and stability. This allows us to understand disease causing mechanism and identify new drug targets. Commonly, computational predictions focus on single amino acid substitutions, which can be evaluated for their change in protein folding and protein stability using established methods in Rosetta. However, other types of variation such as deletions and insertions may occur but are more difficult to evaluate, due to the lack of available crystal structures and functional data. In this project, students will leverage solution state NMR structures and dynamic scanning calorimetry data of deletion and insertion mutants collected in our lab, to develop and test computational protocols for predicting these structural changes. Students will gain experience in Rosetta and machine learning methods.
Merck Protein Engineering Lab in Rahway, NJ
“Leveraging High-throughput automation, rational design, and machine learning to engineer novel enzymes”
Enzymes catalyze a diverse set of chemical transformations with significant rate enhancements and with excellent chemo, stereo, and regiospecificity. These features combined with the fact that enzymes operate in aqueous solution and are typically more environmentally friendly than synthetic catalysts has led to the broad adoption of enzymes in the chemical industries. While enzymes are amazing catalysts, they have evolved to solve the challenges faced by Mother Nature and not the challenges we face today. We use computational protein design and evolution based methods to engineer and invent new protein functions. This project will leverage our high-throughput automation capabilities with structure based design and machine learning to engineer enzymes with novel properties. Students will gain experience in computational protein design, machine learning, and wet-lab methods for engineering proteins.
Mills Lab @ Arizona State University in Tempe AZ
" Computational design of proteins containing functional non-canonical amino acids"
Despite the amazing functions proteins achieve with only 20 standard building blocks, the ability to add new chemistries to the genetic codes of standard organisms could allow for new functions. For the last two decades, over 150 "non-canonical amino acids" (NCAAs) have been added to the genomes of organisms from E. coli to mice. In the Mills lab, we use Rosetta to design proteins that take advantage of the novel chemical functionalities contained in some of these NCAAs. Current efforts are focused on the development of rapid diagnostics (i.e. for COVID-19) and new metalloproteins. Interns in our group will have the abiity to learn how to both design and experimentally characterize new proteins containing NCAAs.
Rocklin Lab @ Northwestern University in Chicago, IL
" Applying high-throughput experimental data to guide computational protein design"
Today, most computational protein design tools like Rosetta use the features of natural proteins structures (which amino acids like to be near each other, what types of structures are very common, etc) to guide the design of new proteins. However, for many applications, we want to design proteins with properties far beyond what already exists in nature. To achieve this, we need new sources of data - not just natural protein structures - that can guide design into new territory. Our lab develops new experimental methods to measure properties like folding stablity, binding affinity, and dynamics for tens to hundreds of thousands of designed or natural proteins at the same time. We then use these new large datasets to guide protein design proteins. We have a range of different focused on basic science, therapeutic development, and tools for synthetic biology. Each person's project is described on our website (www.rocklinlab.org). We will work with an intern or post-bac to find which project in our lab is best for their interests.
Siegel Lab @ University of California, Davis in Davis, CA
" Computational enzyme design and modeling"
Engineered enzymes play a central role in almost every aspect of life, from the food we eat to the production of medicines. High accuracy modeling of enzyme active sites is critical for functional engineering efforts. The project will explore various approaches to combined external information (i.e. enzyme mechanism, QM modeling, machine learning derived from bioinformatics, etc) with the current cutting edge molecular modeling tools in Rosetta. Evaluations for improved enzyme active site modeling accuracy will be conducted on a previously established and benchmarked functionally currated structural data set of enzyme active sites.
Whitehead Lab @ University of Colorado, Boulder in Boulder, CO
" Protein design for plant biotechnology"
Imagine the ability to turn plant phenotypes like flowering, growth, water usage, pigmentation on and off with environmentally safe chemicals. My group is working on a platform technology to allow such unprecedented control over plant life. The student working on this project will use Rosetta in conjunction with wet lab biochemistry to design and test new sense and response modules. These modules will be validated in yeast and ported to grasses and tomato plants for testing after the summer.
Companies may partner with us and sponsor an intern--click here for more information
Intern Research Posters: