Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The goal of eliminating tuberculosis (TB) by 2050 depends on the development of improved TB diagnostics, drugs and vaccines. Advances in these areas require a deep understanding of the disease and its causative agent, Mycobacterium tuberculosis (M. tb). Mycobacterial species that cause TB in humans and other mammalian hosts are grouped within the M. tb complex. Development of powerful technologies such as next-generation sequencing and microarrays opened up new avenues for comparative and functional genomics of the M. tb complex. Due to the large and increasingly complex datasets generated from these technologies, the bottleneck in biological investigation has shifted from data generation to analysis. The objectives of this thesis were to establish and employ strategies for the analysis, integration, and interpretation of high-throughput sequencing and microarray datasets using a range of bioinformatics and statistical tools. In the area of comparative genomics, we assessed the genetic diversity in the M. tb complex using various methods, such as SNP (single nucleotide polymorphism) genotyping, automated Sanger sequencing and next-generation sequencing. In a study comparing the genomes of the virulent M. bovis and M. bovis BCG vaccine strains, we identified a set of SNPs that were common to all BCG strains, and could provide novel insights on the molecular basis of BCG attenuation. In another study, we surveyed the genetic variation in the highly immunodominant esx gene family among clinical isolates of M. tb and identified sequence polymorphisms in known T- cell epitopes on Esx proteins that could affect their immunogenicity. We exploited the power of next-generation sequencing to detect sequence variation among M. tb strains that could result in phenotypic differences. By comparing the genomes of drug-resistant mutants with the sensitive wild-type strain we were able to identify the target of the anti-TB drug, pyridomycin. Using a similar approach we identified a mutation that makes M. tb strains incapable of producing PDIMs (phthiocerol dimycocerosates), which are cell wall associated lipids involved in M. tb virulence. In the area of functional genomics, we mapped genome-wide binding sites for transcription factors using chromatin immunoprecipitation followed by hybridization to microarrays (ChIP-on-chip) or sequencing (ChIP-seq), and performed transcription profiling by means of high-throughput cDNA sequencing (RNA-seq). We carried out a comprehensive study to characterize the whole transcriptome of M. tb in exponential and stationary phases of growth, and understand the genome-wide dynamics of two key components of the transcription machinery, namely, RNA polymerase and NusA. By systematic integration of the ChIP-seq and RNA-seq data, we identified a set of transcription units (TU) in the M. tb genome, and mapped their putative promoters. Analysis of RNAP and NusA binding across the promoter and body of TUs and their correlation with transcription uncovered new functional aspects of the transcriptional complex in M. tb. We also exploited the ChIP-on-chip and ChIP-seq technologies to define the regulon of the M. tb sigma factor F, and gain a better understanding of the regulatory role of the nucleoid associated protein, EspR. Altogether, this thesis has improved our knowledge of the evolution, physiology and virulence of the M. tb complex. In addition, we have established next generation sequencing as a powerful tool for comparative and functional studies, with potential applications in the clinical setting.
Tamar Kohn, Xavier Fernandez Cassi
Bart Deplancke, Daniel Migliozzi, Gilles Weder, Riccardo Dainese, Daniel Alpern, Hüseyin Baris Atakan, Mustafa Demir, Dariia Gudkova