Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In the beginning was the metabolism. The biochemical processes that make life possible transformed the soup of chemicals into the life on Earth we know today. Since then, living organisms have evolved, and life on Earth has become more complex. Living organisms learned how to use biochemistry to make not only the structure of their bodies more complex but also the structure of their metabolism. We are gradually discovering the lessons of nature on what chemistry can do and how to use its complex biochemical routes to our advantage. This is the work of metabolic engineers: learn the rules of biochemistry, design, and implement biochemical routes for producing complex natural and non-natural compounds efficiently. This thesis presents the application of our knowledge of biochemistry to predicting biochemical routes for complex small molecules. To achieve this, we used a mathematical description of enzymatic reaction mechanisms called biochemical reaction rules. Reaction rules encode the information about the part of the molecule that corresponds to the reactive site of the enzyme and which bonds will be broken and formed by the enzyme. The reaction rules we used are "generalized" since they can interpret several molecules as potential substrates of an enzyme. Using these rules, we generated a database of hypothetical biochemical reactions called ATLASx, which stores over 5 million predicted reactions. We further sought to extrapolate the biochemical reaction rules to the scope of all known organic compounds and identified enzymatic reactive sites in over 90 million compounds of the PubChem database. Next, we explored how to tailor the databases of the predicted biochemical reactions for a specific application. We designed a shikimate metabolism-specific open-source tool for an efficient pathway design in an ARBRE framework. We discovered that well-annotated databases centered around a specific metabolism type are advantageous for complex pathway design, as they provide more relevant results and avoid taking detours through undesired metabolism types. After designing the databases and tools for linear pathway search, we focused on the method to integrate the graph theory algorithms with constraint-based optimization for the design of branched pathways, called SubNetX. Using this method, we predicted theoretically feasible in the host organism pathways with higher yield than the native pathway. Finally, we shifted our attention to annotating the hypothetical reaction steps of the predicted pathways with protein sequences using the BridgIT+ method. The developed tools and approaches pave the way toward the biosynthesis of complex small molecules in artificial ways, either in heterologous hosts or as an addition to the repertoire of reactions for organic retrosynthesis. Our findings suggest that the databases of predicted biochemical reactions will become instrumental for understanding metabolism, its network structure, and applications. The proposed integration of the cheminformatic tools for reaction prediction with bioinformatic tools for sequence annotation and constraint-based optimization creates a promising computational environment to advance the efforts of synthetic biologists and metabolic engineers. We hope our results will promote a bio-based economy, make a broader scope of drugs synthetically available at a larger scale and accelerate the shift from petroleum-based chemical synthesis toward more sustainable solutions.
Mika Tapani Göös, Siddhartha Jain
Vassily Hatzimanikatis, Anastasia Sveshnikova