Multi-motif scaffolding for de novo pathogen antigen mimetics by deep generative learning

Proteins are macromolecular machines central to virtually every fundamental biological function. Three-dimensional protein structures, determined by their amino acid composition, are directly linked to their function. Research into structure-function relationships has allowed for the design of novel proteins that adopt defined three-dimensional structures. Further aided by the development of computational tools, protein designers can construct proteins embedded with desired functional motifs.

Motif grafting is a method of presenting functional motifs on non-native scaffold structures. However, to date the majority of motifs able to be grafted are restricted to those for which a suitable host protein with high local similarity exists in structurally characterised databases. Yet, critical motifs in biomedicine, particularly viral epitopes, often consist of structurally intricate elements requiring novel topologies for accurate presentation. Moreover, despite significant progress in the design of functional de novo proteins, protein design strategies have largely adopted a one scaffold for one application philosophy owing to the complexity of grafting structurally intricate motifs on de novo proteins. For protein design to mimic natural proteins capable of adopting multiple functions and binding partners, the challenge of embedding more than one function into de novo proteins remains at the forefront of protein design.

One major application of functional protein engineering is the design of immunogens embedded with known sites of pathogen vulnerability. Current approaches rely on cocktail vaccine formulations, yet, an alternative approach using a multi-epitope presenting immunogens may be beneficial for modularity and tunability of the elicited immune response. My thesis work leverages computational design with machine learning methods to incorporate multiple epitopes on one immunogen. My thesis showcases the design of chimeric hemagglutinin (HA) presenting two genetically diverse strains. We leverage the structural conservation of HA of one strain to present complex epitopes of another strain to elicit polyclonal protection.

Going beyond natural protein space for immunogen design, my thesis further explores presentation of multiple distinct epitopes on de novo scaffolds. Using deep learning models, we are able to embed, with high accuracy, up to three motifs into de novo scaffolds of unique topologies that have not been shown to exist in natural repertoires. The multi-motif immunogens significantly increase the epitope surface area compared to a majority of single-epitope de novo immunogens and improve immune responses compared to single-epitope designs. Building on this concept, we additionally adapt disordered motifs. Not only were we able to scaffold more than one repeat of a disordered epitope in its antibody-bound conformation, we showcase the ability of deep learning to present multiple grafted motifs in native relative orientations.