Publication

A Geometric Transformer for Structural Biology: Development and Applications of the Protein Structure Transformer

Lucien Fabrice Krapp
2023
EPFL thesis
Abstract

Proteins, the central building blocks of life, play pivotal roles in nearly every biological function. To do so, these macromolecular structures interact with their surrounding environment in complex ways, leading to diverse functional behaviors. The prediction of these interactions, especially those involving protein-protein interfaces and other molecular interactions, has long been a major challenge in the field of structural biology. However, with the recent surge in advanced computational methods, we are now on the brink of making significant breakthroughs. We developed the Protein Structure Transformer (PeSTo), a deep learning method that leverages a novel operation called geometric transformers. PeSTo only requires as input the atomic coordinates and element names of the structure. This general approach allows the model to be applied to many different tasks without requiring any computationally expensive data processing. The method demonstrated an impressive performance in accurately predicting the protein-protein binding interfaces, outperforming the state-of-the-art methods. We extended PeSTo to predict protein binding interfaces in general, detecting and distinguishing protein interfaces with nucleic acids, ligands, ions and lipids. The defining advantages of PeSTo are its low computational cost and robustness. Unlike many existing tools, PeSTo allows for high-throughput processing of structural data, including molecular dynamics ensembles. This ability to process large amounts of data efficiently enabled us to predict binding interfaces for all AlphaFold predicted structures. This ensemble of binding interfaces, which we call the "interfaceome", has the potential to help the identification of protein binding domains and accelerate research. Beyond protein interacting interface prediction, PeSTo has been applied to another challenging problem in protein design: the prediction of protein sequences from backbone scaffolds. The newly trained model, called CARBonAra (Context-aware Amino acid Recovery from Backbone Atoms and heteroatoms), performs on par with the state-of-the-art methods for the in-silico sequence recovery rate. Unlike other methods, CARBonAra is able to predict amino acid sequences from a backbone scaffold with other non-protein atoms such as nucleic acids and ligands. This ability to consider non-protein entities in the design of protein sequences opens a myriad of possibilities, including the design of proteins that can interact with specific molecules, such as nucleic acids, leading to potential applications in therapeutics and biotechnology. In conclusion, the development of PeSTo represents a significant leap forward in the application of deep learning in structural biology. It not only provides an efficient and accurate tool for predicting protein interactions, but also opens a new frontier in protein design considering non-protein entities. By leveraging the rapidly expanding protein structure data, PeSTo holds vast potential for a broad spectrum of applications in structural biology and material science.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (38)
Protein–protein interaction
Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context. Proteins rarely act alone as their functions tend to be regulated.
Protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
Intrinsically disordered proteins
In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.
Show more
Related publications (209)

Predicting protein interactions using geometric deep learning on protein surfaces

Freyr Sverrisson

In the domain of computational structural biology, predicting protein interactions based on molecular structure remains a pivotal challenge. This thesis delves into this challenge through a series of interconnected studies.The first chapter introduces the ...
EPFL2024

Investigating the intra-molecular and inter-molecular effects of post-translational modifications on intrinsically disordered protein regions and structured protein regions

Zhidian Zhang

Post-translational modifications (PTMs) play a pivotal role in regulating protein structure, interaction, and function. Aberrant PTM patterns are associated with diseases. Moreover, individual PTMs have a complex interaction with each other, known as PTM c ...
EPFL2024

Opportunities and challenges in design and optimization of protein function

Bruno Emanuel Ferreira De Sousa Correia, Casper Alexander Goverde

The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calcu ...
Nature Portfolio2024
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.