OpenFold-Powered Machine Learning of Protein-Protein Interactions and Complexes

PI Mohammed AlQuraishi, Columbia University
Co-PI Zhao Zhang, Texas Advanced Computing Center
AlQuraishi INCITE Graphic

Visualization of a protein-protein complex. Image: Mohammed AlQuraishi, Columbia University

Project Summary

This project will use artificial intelligence to build tools that predict interactions between any two proteins and will make these tools widely available to the biology community.

Project Description

Protein-protein interactions (PPIs) underpin most biological processes. Despite the major role they play in disease, most PPIs in humans are not well understood. Biophysically, PPIs can be classified as idiosyncratic (driven by binding surfaces unique to individual proteins) or as canonical (driven by surfaces reused by members of homologous protein families to bind peptides on partner proteins). Idiosyncratic PPIs are often high-affinity and form stable complexes while canonical PPIs are often transient, low- affinity, and vary minutely across domains to drive signaling logic. While both PPIs are studied by high-throughput experimental methods, the cost, complexity, and insensitivity of these methods, and the enormity of PPI space, have resulted in <20% coverage of the human interactome and sparse coverage of most other species. 

To advance our understanding of PPIs, this INCITE project will use artificial intelligence (AI) to develop tools that predict interactions between any two proteins and make these tools widely available to the biology community. The research team will use DOE supercomputers to build computational methods for identifying novel idiosyncratic and canonical PPIs by combining multiple tiers of direct and indirect binding data with supervised and unsupervised machine learning models that account for varying degrees of experimental evidence. To conduct this research, the team developed OpenFold, a trainable implementation of AlphaFold2 (an AI tool used for predicting protein structures). 

The researchers will tackle PPI prediction by building three types of models: (1) a supervised model for predicting idiosyncratic PPIs; (2) a supervised model for predicting canonical peptide-mediated PPIs; and (3) an unsupervised model for predicting canonical peptide- mediated PPIs. The team has produced preliminary results for all three models that support their validity. Their idiosyncratic PPI model aims to help identify novel protein complexes and human/human pathogen PPIs for drug targeting. Similarly, their canonical PPI models are designed to help unravel signaling networks and their dysregulation in disease by modeling the effects of mutations on PPIs. The proposed models thus have the potential to be as transformative to protein interactomes as AlphaFold2 has been to protein structure.