Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The true understanding of most cellular functions is only really achievable through the structural determination of their underlying macromolecular assemblies. However, their size, large number of individual components and metastable states make their elucidation by canonical structural biology techniques a vain effort in many situations. As a result, integrative approaches have been in high demand in the recent years. By combining any available data, both computational and experimental, integrative or hybrid modeling strategies have been able to tackle biomolecular structures otherwise intractable by any singular method alone. Over the past decade, structural biology has undergone a significant revolution in the form of cryo-electron microscopy (cryo-EM). With more than 12.000 models of biomolecular structures under its name, cryo-EM however still struggles to approach larger, flexible assemblies at atomic resolution. Instead, such assemblies now routinely fall in an intermediate resolution range, opening new avenues for the development of hybrid approaches primarily based on cryo-EM data. However, the large degrees of flexibility of such assemblies imposes the inclusion of dynamics in the modeling process. Unfortunately, this only makes the already challenging task of defining a scoring function even more difficult. In general, current integrative strategies are thus unable to approach flexible assemblies in a reliable manner. To tackle these issues, two new methods made available to the community are presented in this thesis in addition to their applications to a variety of systems. A new clustering analysis tool, CLoNe, is first introduced. As a significant upgrade to the recent Density Peaks algorithm, we show that CLoNe rivals and outperforms many state-of-the-art algorithms even beyond structural biology. Then, we show how CLoNe is able to extract a variety of information from structural ensembles in general or reduce large ensembles to key components, enabling their successful integration into hybrid modeling approaches. Second, the MaD software is presented. Taking inspiration from traditional computer vision concepts and methods, MaD bypasses the need of a traditional scoring function through the generation of local macromolecular feature descriptors. MaD takes full advantage of the ongoing cryo-EM resolution revolution by integrating local structural information from both cryo-EM data and existing atomic structures. Specifically, MaD is able to predict the quaternary structure of large assemblies regardless of symmetry and conformational variability. Finally, the MaD-CLoNe combination enabled the modeling of molecular chaperones with unprecedented ease. Chaperonins are of crucial importance to ensure proper protein folding and proteostasis. Perturbation of these processes are implicated in neurodegenerative diseases and cancer. In the specific case of the group II chaperonin Mm-Cpn, the use of MaD-CLoNe along with molecular dynamics simulations uncovered new structural models and insight into the chaperonin's functional pathway.
Henning Paul-Julius Stahlberg, Dongchun Ni