Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
A variety of DNA-binding proteins organizes the chromosomal DNA and regulates gene transcription, and DNA replication and recombination. In particular, for gene regulation there is a category of DNA-binding proteins, the transcription factors, which can detect and bind to a specific set of DNA motifs. For the protein-DNA complexes there are some solved structures, but for some cases, it is difficult to define experimentally these structures at the atomistic level. However, there are various in silico methods, that alone or in combination with experimental techniques, are capable of predicting proteins structures with high accuracy. DNA-binding proteins contain DNA-binding domains and unfortunately, there are few protein-DNA complexes that have been characterized at atomistic resolution. Here we have developed artificial neural networks to predict the binding sites and the interface between proteins and the DNA, based on a structure-based approach. Specifically, we used different interface descriptors and we included a variety of protein-DNA complexes, in order to predict the correct localizations of the DNA on a protein. A large group of proteins that can recognize specific DNA sequences is the KRAB-ZFP family. DNA recognition by zinc finger proteins (ZFPs) plays an important role in gene regulation. However, the molecular determinants defining the recognition of specific DNA nucleotides by ZFP finger repeats has not yet been decoded. Here, we present a method that can predict which ZF repeats specifically bind to a DNA target sequence and what is the most probable DNA target sequence for these repeats. Our method is based on the structural analysis of the binding network of resolved protein DNA complexes, and is validated on a benchmark set of solved ZFP-DNA complexes. We also characterized the binding specificity of two KRAB-ZFPs (ZFP14 and ZNF145) integrating our predictions with SMiLE-seq (Selective Microfluidics-based Ligand Enrichment followed by sequencing) data. Subsequently, we use our recognition code on ChIP-exo data, in order to give an insight into the poly-ZF domains binding patterns. We determined the most probable binding motifs and we separated the ZFPs in two groups: the ones that have a potential canonical and non-canonical binding. Altogether, this work gives an insight into the proteins that interact with the DNA, through a better understanding of the various protein-DNA interfaces and the predictions of these complexes at the atomistic level.
Suliana Manley, Chen Zhang, Laurent Casini