Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
While advances in genome sequencing have greatly facilitated the inference of metabolic networks, the presence of unknown biochemistry in the organisms still challenges the analysis and understanding of metabolism. Even for one of the most known organism like Escherichia coli, there are around 10% of its open reading frames (ORFs) that remain to be annotated. The experimental identification of metabolic capabilities in an organism will benefit from the guidance from computational analysis. The study of genome-scale models represents an attractive approach to identify the metabolic capabilities required for growth and for the connectivity of all metabolites that are part of the metabolic network. For this purpose, we develop a gap-filling approach that identifies known and novel alternative reactions to the ones integrated in the latest genome-scale models of E. coli (iJO1366) and Saccharomyces cerevisiae (iTO977). The novel reactions were obtained from the recently developed repository of all possible biochemical reactions (ATLAS of biochemistry). Our method uses a mixed-integer linear programming (MILP) formulation to identify alternative metabolic reactions that satisfy mass balance constraints. We then evaluate the thermodynamic feasibility of the novel reactions at the intracellular conditions. We further used a cheminformatics tool to compare the sequence similarity of the alternative gap-filled enzymes with the ORF of closely related organisms. Here, we present a comparative study between the unknown biochemistry from E. coli and S. cerevisiae, and we highlight the currently unknown metabolic functions that represent the most attractive candidates for experimental characterization.