Computational study of transcription factor binding sites

Any living organism contains a whole set of instructions encoded as genes on the DNA. This set of instructions contains all the necessary information that the organism will ever need, from its development to a mature individual to environment specific responses. Since all these instructions are not needed at the same time, the gene expression needs to be regulated.

Eukaryotic genomes are stored inside nuclei as chromatin. The chromatin is the association of DNA with dedicated storage proteins - the histones - and the necessary machinery to regulate and express genes (RNA polymerases or RNAPs).

In the nuclei, histones are assembled into octamers around which are wrapped ~148bp of DNA. This structure is known as the nucleosome. The repetition of nucleosomes along the genome allows to drastically compact the genome, eventually allowing to fit it inside the nucleus. However, this comes at the cost of rendering the DNA sequence inaccessible to DNA readers, such as the RNAPs and transcription factors (TFs).

TFs are a class of proteins that have the remarkable property of recognizing and binding specific DNA sequences. More striking, each TF can recognize a multitude of different - but similar - DNA sequences providing TFs with a wide sequence specificity range. Eventually, this allows the cell to recruit TFs at dedicated locations in the genome called regulatory elements (REs).

The action of TFs at REs is crucial to gene expression. Indeed, TFs are involved in many processes such as the opening of the chromatin structure or the recruitment of RNAPs. However if TFs can influence the chromatin structure, the opposite is also true as histones impede TF binding on DNA. Thus the regulation of genes relies on a subtle and complex crosstalk between the chromatin and TFs.

To better understand how TFs and chromatin interact together to regulate gene expression, I lead several projects prospecting TF binding specificity and the chromatin structure at REs in human.

First, I used ENCODE next generation sequencing (NGS) data to explore how TF binding influences the nearby nucleosome organization and the propensity of some TFs to bind together. The results suggest that regular nucleosome arrays are found near all TFs. They also point out two special cases. When CTCF binds with the cohesin complex, it seems to drive the nucleosome organization, which is a unique feature among all TFs investigated. Additionally I present evidence supporting that EBF1 is a pioneer factor - a special class of TFs able to bind nucleosome.

Secondly, I developed several clustering algorithms and software to partition genomic regions according to NGS data and/or on their DNA sequences. These methods allow to discover important trends, for instance different nucleosome architectures . I illustrated the usefulness of these methods for the study of chromatin accessibility data and the identification of REs.

Thirdly, I participated to the assessment of SMiLE-seq, a new microfluidic device that generates TF specificity data. The creation of TF specificity models and their comparison with other publicly available models demonstrated the value of SMiLE-seq to study TF specificity.

Finally, I participated in the development of a software that predicts TF binding sites. A careful benchmarking suggested that this software is - at the time of writing - the best available software in terms of speed while showing other performances similar to its competitors.

Computational study of transcription factor binding sites

Graph Chatbot

Chattez avec Graph Search

Decoding Chromatin Ubiquitylation: A Chemical Biology Perspective

Multi-well plate lid for single-step pooling of 96 samples for high-throughput barcode-based sequencing

Comparison of Three Viral Nucleic Acid Preamplification Pipelines for Sewage Viral Metagenomics

Decoding Chromatin Ubiquitylation: A Chemical Biology Perspective

Multi-well plate lid for single-step pooling of 96 samples for high-throughput barcode-based sequencing

Comparison of Three Viral Nucleic Acid Preamplification Pipelines for Sewage Viral Metagenomics