Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Recent advancements in DNA sequencing technologies, accompanied by parallel development of sophisticated computational methods, offer an unprecedented opportunity to explore and study the genetic material of many organisms. However, despite the tremendous progress in the field of genomics, our understanding of the genetic information encoded by the DNA is far from complete. Whole-genome comparisons of vertebrate species have uncovered the existence of non-coding DNA sequences that exhibit exceptionally high levels of similarity over hundreds of base pairs across distant species, referred to as ultraconserved non-coding elements (UCNEs). The existence of such sequences represents one of the biggest mysteries in current biology. Many UCNEs exhibit even stronger conservation than protein coding regions, but to date no molecular mechanism has been described that would require such a high degree of conservation over such long sequences. This thesis focuses on computational analysis of UCNEs across vertebrate genomes, with an ultimate goal to provide insights into the functioning of these elements and to unravel the reasons for their extreme conservation. Using computational approaches we studied the salient characteristics of UCNEs, and explored their genomic environment and their organization across genomes. Like previous studies, we observed that UCNEs are organized as large clusters around essential developmental genes, forming genomic regulatory blocks. Then we sought to understand the reasons behind this strong clustering of UCNEs. The clustering could reflect functional cooperativity between UCNEs. Alternatively, if each UCNE acts independently, their high concentration near developmental genes could merely reflect the extreme regulatory complexity of these genes. In a special setting of genomic context analysis, we analyzed the fate of UCNEs in teleost fish genomes that were subjected to an additional round of whole-genome duplication. We found that in most cases all UCNEs of a block were retained in one copy only, but together on the same chromosome. Conversely, the corresponding target genes were often retained in two copies, one completely devoid of UCNEs. Our results suggested that UCNEs of a cluster function in a highly cooperative manner. We propose that a multitude of cooperative cis-interactions between UCNEs is the reason for their extreme sequence conservation. Our work presents a novel hypothesis about the reasons for UCNE conservation and their mode of action, which is yet to be confirmed experimentally. The results from our study are made accessible through a new web resource called UCNEbase (http://ccg.vital-it.ch/UCNEbase). UCNEbase provides informa- tion about the genomic organization of UCNEs and their association with developmental regulatory genes, and enables all data to be explored in a genome browser environment. As part of a side project, we developed a new sparsified version of algorithms for computational prediction of RNA secondary structures and introduced a new program sibRNAfold. We performed a thorough analysis of the time complexity of sparsified RNA folding algorithms covering a large parameter space and empirically demonstrated that sparse RNA folding has cubic time complexity rather than quadratic, as claimed previously.
Didier Trono, Evaristo Jose Planet Letschert, Wayo Matsushima
,