Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Modern biology rapidly generates a wealth of data,which can only be analysed computationally.Despite the variety of available data types,most of them aim to provide an answer to one question: how does genotypic diversity translates into phenotypes? This question became central for this thesis where the computational analysis,via integration of genomics and transcriptomics,became the main method to unravel the full picture.Two biological processes were examined with this question in mind: mitochondrial genomics and circadian genomics and transcriptomics.First,to uncover the relationship between mitochondrial genetics and phenotypes,we produced a high quality catalogue of mitochondrial variants across 169 Drosophila Genetics Reference Panel (DGRP) lines.The high coverage and quality of the variant calls was achieved via developed mitochondrial DNA (mtDNA)-enriching method.The reliability of variant detection was supported by the low contamination rate of nuclear mitochondrial DNA fragments detected de-novo by a computational method developed in this thesis.Within the coding part of the mtDNA,231 variants were detected,most of them being SNPs.Based on the computed mitochondrial haplotypes,we revealed population structure despite common perception of its absence within the DGRP.Haplotype-phenotype associations with publicly available phenotypes showed a significant association with food intake in males which was further confirmed experimentally.At the molecular level,haplotypes did not reveal a strong association with mitochondrial protein complex quantity or activity.However,the activity showed greater variability between genotypes than haplotypes,therefore indicating a possible buffering effect.Furthermore,numerous mito-nuclear genome incompatibilities were detected computationally,but didn't affect tested phenotypes.Secondly,to determine the diversity of the circadian transcriptome across tissues and the influence of genetic diversity thereon,a vast dataset of 778 transcriptomes was produced.The first part of the dataset was an RNA-seq time series across four tissues of the w- reference fly strain.This dataset was used to reveal a high degree of tissue specificity of the circadian clock transcriptome,where the majority of the genes cycled restrictively in one tissue.Out of the 14 genes cycling in all tissues,7 weren't previously associated with the circadian rhythm.The evaluation of novel genes through the activity of flies with gene knockdowns revealed their functional relevance.Next,a cross tissue gene regulatory network indicated on a potential mechanism of tissue specific circadian expression: it is achieved through collaboration of core circadian transcription factors,such as clock,with other cycling and non-cycling TFs.The second part of the dataset consisted of static tissue specific transcriptomes of 141 DGRP lines.Tissue specific physiological time was computationally estimated and revealed a high variability across the samples suggesting underlying genetic components.For one of the samples the physiological time differed >10h from the harvesting time and displayed an abolished molecular rhythm,yet preserved its rhythmic behaviour.The genetic component underlying this observation was a newly identified cry allele disrupting the light sensing pathway.As a whole,this thesis demonstrates the utility of complex multi-omics approaches in modern biology and reveals new genetic determinants for food intake and circadian rhythm defects
Felix Naef, Cédric Gobet, Lorenzo Talamanca