Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In the domain of computational structural biology, predicting protein interactions based on molecular structure remains a pivotal challenge. This thesis delves into this challenge through a series of interconnected studies.The first chapter introduces the concept of protein molecular surfaces, which are characterized by distinct patterns of chemical and geometric features, serving as fingerprints for their interaction modalities. We present MaSIF (Molecular Surface Interaction Fingerprinting), a novel geometric deep learning framework. This tool is adept at predicting protein pocket-ligand interactions, protein-protein interaction sites, and scanning protein surfaces for potential protein-protein complexes.Building on the insights from the initial chapter, the second chapter addresses the limitations of mesh-based representations in protein structures. We propose a deep learning framework that computes and samples the molecular surface directly from the atomic point cloud. This method, which requires only raw 3D coordinates and chemical types of atoms as input, has demonstrated state-of-the-art performance in identifying interaction sites and predicting protein-protein interactions.The third chapter, informed by the preceding work, presents DiffMaSIF, a cutting-edge score-based diffusion model tailored for rigid protein-protein docking. DiffMaSIF leverages a surface-based molecular representation, integrated into an equivariant network, to efficiently predict protein complexes. This approach surpasses contemporary ML methods and aligns with traditional docking tools, but with a significantly reduced number of generated decoys.Collectively, the research in this thesis offers a series of methodologies that, while building on each other, individually contribute significant advancements to our understanding and prediction of protein interactions, paving the way for future work in protein function prediction and design.