Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Recent advances in synthetic biochemistry have resulted in a wealth of de novo hypothetical enzymatic reactions that are not matched to protein-encoding genes, deeming them “orphan”. Nearly half of known metabolic enzymes are also orphan, leaving important gaps in metabolic network maps. Proposing genes for the catalysis of orphan reactions is critical for applications ranging from biotechnology to medicine. In this work, a novel computational method, BridgIT, assigned a potential enzyme sequence to orphan reactions and nearly all theoretically possible biochemical transformations, providing candidate genes to catalyze these reactions to the research community. BridgIT introduces, for the first time, information about the enzyme binding pocket into reaction similarity comparisons and it assesses the similarity of two reactions, one orphan and one non-orphan, using the reactive sites of their substrates and their surrounding structures, along with the structures of the generated products, and then suggests protein sequences and genes of the most similar non-orphan reactions as candidates for catalyzing the orphan ones. To evaluate BridgIT, we performed an analysis of the orphan reactions from KEGG 2011 that became non-orphan in KEGG 2016, and BridgIT correctly predicted the exact or a highly related enzyme for 49% and 94% of these reactions, respectively. BridgIT results reveal that knowledge about only three connecting bonds around the atoms of the reactive sites is sufficient to correctly identify protein sequences for 93% of analyzed enzymatic reactions. Increasing to six connecting bonds around the atoms of the reactive sites allowed for the accurate identification of a reference protein sequence for nearly all known enzymatic reactions. The proposed candidate enzymes by BridgIT, are either capable of catalyzing these reactions or they can serve as good initial sequences for the enzyme engineering. BridgIT online tool is freely available on the web (http://lcsb-databases.epfl.ch/) for academia upon subscription.
Bruno Emanuel Ferreira De Sousa Correia, Casper Alexander Goverde
Alexandra Krina Van Hall-Beauvais