Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Organocatalysis has evolved significantly over the last decades, becoming a pillar of synthetic chemistry, but traditional theoretical approaches based on quantum mechanical computations to investigate reaction mechanisms and provide rationalizations of catalyst performance have failed to keep pace with experiment. This thesis focuses on developing tailored yet transferable data-driven tools and concepts to accelerate organocatalyst discovery, going beyond state-of-the-art computational methods, by addressing three aspects: (1) reaction optimization using closed-loop workflows and strategies based on molecular building blocks for generating candidate species from fragments, (2) establishing cost-effective ways of evaluating how close a prospective catalyst is to achieving optimal performance (i.e., fitness functions), and (3) facilitating and improving the prediction of enantioselectivity and generality through accurate machine learning algorithms and efficient inverse design pipelines.The first aspect examines the under-exploited modularity of organocatalysts to enable bottom-up database construction, accelerated activity-based screening, and inverse catalyst design. By defining structural components that encapsulate a catalyst's functionalities, we were able to curate a database of thousands of structures mined from the literature or generated combinatorially. These building blocks may be assembled on-the-fly to suggest prospective species with improved performance.The second aspect focuses on harnessing the structure-activity relationship offered by molecular volcanoes as a way to establish a catalyst's fitness in closed-loop optimizations. To this end, we developed a genetic algorithm package, NaviCatGA, and showed that it is an efficient tool to streamline computer-aided catalyst discovery. Multi-objective problems e.g., activity-selectivity tradeoffs, may also be solved with evolutionary experiments by considering, and scalarizing, more than one target simultaneously.In the final section, we address current limitations of machine learning and generative models in predicting and optimizing challenging targets, specifically enantioselectivity and catalyst generality. We design reaction-inspired representations to improve the accuracy of physics-based models and show how evolutionary experiments may be planned to find catalysts displaying high performance across a broad substrate scope.Overall, this thesis demonstrates how tailored data-driven tools and concepts that are able to address the unique properties and structures of organocatalysts streamline reaction optimization and the discovery of prospective new species.
Rubén Laplaza Solanas, Anne-Clémence Corminboeuf, Puck Elisabeth van Gerwen, Alexandre Alain Schöpfer, Simone Gallarati
Philippe Schwaller, Jeff Guo, Bojana Rankovic