Towards Breakthroughs in Protein Structure Calculation and Design

PI Name: 
David Baker
PI Email: 
dabaker@u.washington.edu
Institution: 
University of Washington
Allocation Program: 
INCITE
Allocation Hours at ALCF: 
140 Million
Year: 
2013
Research Domain: 
Chemistry

Calculation of protein structure and design of novel proteins are two of the most important challenges in structural biology. Addressing these challenges will help researchers cure diseases and design proteins that can efficiently catalyze medically and industrially useful reactions. This project builds on earlier successes and increases the scope of research. In most cases, we will move from benchmarking to proof-of-concept applications.

A recent breakthrough in conformational sampling using highly parallel computations on the Blue Gene platform will allow us to compute highly accurate structures for proteins as large as 20 kDa while using very limited experimental data. The use of INCITE resources will facilitate advances in many challenging problems in computational structural biology, including the ab initio prediction of proteins larger than 15 kDa, the calculation of structures of proteins larger than 20 kDa using sparse nuclear magnetic resonance data, the determination of membrane protein structures, and the design of a novel enzyme system to fix carbon dioxide (CO2) to produce biofuels.

We will use various models/methods to achieve success. For example, the use of ab initio structure prediction will help provide accurate models for biologists to suggest hypotheses relating to biological function and to afford phasing information for proteins for which X-ray diffraction experiments have been carried out, but phase information is not available. In addition, we will continue to develop and apply the Rosetta method to obtain accurate, fully automated predictions for proteins up to 20 kDa.

The major challenge in genomics is not in obtaining large amounts of sequence data, but in interpreting it. The broader impacts of this work will include pressing issues in the 21st century, including deciphering the structures and functions of the vast number of protein sequences generated in current high-throughput sequencing projects and reducing the levels of CO2 in the atmosphere through enzymes designed to fixate CO2 into industrially useful products.

Catalyst: