Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The local physical properties - such as shape and flexibility - of the DNA double-helix is today widely believed to be influenced by nucleic acid sequence in a non-trivial way. Furthermore, there is strong evidence that these properties play a role in many important processes, such as protein binding or nucleosome positioning. In order to address such biologically pertinent problems, we have developed mathematical and computational tools to be able to identify or predict sequences and sites in genome-sized data based on their mechanical properties. For this, we rely on the cgDNA coarse-grain model, which provides a detailed sequence dependent description of the statistical mechanics of DNA. As an application, we present a method inspired from information theory techniques to scan the genome of S. cerevisiae in search for mechanically exceptional sequences (or outliers). This method reveals a systematic bias for A/T base pair content and AA/TT dimer content in mechanical outlier sequences. Moreover, it shows an even drastically stronger preference for CG dimer content when CpG steps are methylated. Finally, a clustering analysis of exhaustive ensembles of DNA shape predictions of the cgDNA+ model reveals the importance of purine/pyrimidine content.
John Maddocks, Rahul Sharma, Alessandro Samuele Patelli