The goal of eliminating tuberculosis (TB) by 2050 depends on the development of improved TB diagnostics, drugs and vaccines. Advances in these areas require a deep understanding of the disease and its causative agent, Mycobacterium tuberculosis (M. tb). Mycobacterial species that cause TB in humans and other mammalian hosts are grouped within the M. tb complex. Development of powerful technologies such as next-generation sequencing and microarrays opened up new avenues for comparative and functional genomics of the M. tb complex. Due to the large and increasingly complex datasets generated from these technologies, the bottleneck in biological investigation has shifted from data generation to analysis. The objectives of this thesis were to establish and employ strategies for the analysis, integration, and interpretation of high-throughput sequencing and microarray datasets using a range of bioinformatics and statistical tools. In the area of comparative genomics, we assessed the genetic diversity in the M. tb complex using various methods, such as SNP (single nucleotide polymorphism) genotyping, automated Sanger sequencing and next-generation sequencing. In a study comparing the genomes of the virulent M. bovis and M. bovis BCG vaccine strains, we identified a set of SNPs that were common to all BCG strains, and could provide novel insights on the molecular basis of BCG attenuation. In another study, we surveyed the genetic variation in the highly immunodominant esx gene family among clinical isolates of M. tb and identified sequence polymorphisms in known T- cell epitopes on Esx proteins that could affect their immunogenicity. We exploited the power of next-generation sequencing to detect sequence variation among M. tb strains that could result in phenotypic differences. By comparing the genomes of drug-resistant mutants with the sensitive wild-type strain we were able to identify the target of the anti-TB drug, pyridomycin. Using a similar approach we identified a mutation that makes M. tb strains incapable of producing PDIMs (phthiocerol dimycocerosates), which are cell wall associated lipids involved in M. tb virulence. In the area of functional genomics, we mapped genome-wide binding sites for transcription factors using chromatin immunoprecipitation followed by hybridization to microarrays (ChIP-on-chip) or sequencing (ChIP-seq), and performed transcription profiling by means of high-throughput cDNA sequencing (RNA-seq). We carried out a comprehensive study to characterize the whole transcriptome of M. tb in exponential and stationary phases of growth, and understand the genome-wide dynamics of two key components of the transcription machinery, namely, RNA polymerase and NusA. By systematic integration of the ChIP-seq and RNA-seq data, we identified a set of transcription units (TU) in the M. tb genome, and mapped their putative promoters. Analysis of RNAP and NusA binding across the promoter and body of TUs and their correlation with transc
Bart Deplancke, Daniel Migliozzi, Riccardo Dainese, Daniel Alpern, Gilles Weder, Mustafa Demir, Dariia Gudkova, Hüseyin Baris Atakan
Tamar Kohn, Xavier Fernandez Cassi