Using Machine Learning to Design Highly Stable, Biologically Active Proteins

Gina El Nesr (WSE & KSAS)
Mentor: Doug Barrick (Biophysics, KSAS)

Researchers have sought methods to design proteins that are highly stable and retain their biological activities. The recent dramatic increase in genome sequencing data provides scientists with sufficient data for sequence-based protein design. One method for protein design that has shown success for stabilizing proteins uses consensus sequences. Although consensus sequences have been found to be more stable and biologically active, the implicit assumption of that residue’s frequencies are independent. In protein structure, residues are coupled to one-another in a large interconnected network of interactions. The goal of this project is to employ a robust method to design proteins that incorporates these residue interactions. By using Restricted Boltzmann Machines to learn residue sequence-structure encodings, we can potentially discover sequences of proteins with improved stabilities, solubilities and shelf-lives. Developing such a methodology has applications in pharmaceuticals, biotechnology, and chemical industries.

IDIES logo