Publication

A sequence-dependent coarse-grain model of B-DNA with explicit description of bases and phosphate groups parametrised from large scale Molecular Dynamics simulations

Résumé

We introduce a sequence-dependent coarse-grain model of double-stranded DNA with an explicit description of both the bases and the phosphate groups as interacting rigid-bodies. The model parameters are trained on extensive, state-of-the-art large scale molecular dynamics (MD) simulations. The model paradigm relies on three main approximations: 1) nucleic acid bases and phosphate groups are rigid, 2) interactions are nearest-neighbour and can be modelled with a quadratic energy, 3) model parameters have dimer sequence dependence. For an arbitrary sequence, the model predicts a sequence-dependent Gaussian equilibrium probability distribution. The parameter set comprises dimer-based elements, which are used to reconstruct mean configurations, called ground-states, which can have strong non-local sequence dependence, and precision matrices, or stiffness matrices, for any sequence of any length. This prediction step is sufficiently efficient that it is straightforward to construct probability density functions for millions of fragments each of length a few hundred base-pairs. The estimation of a parameter set consists in minimising the sum of Kullback-Leibler divergences between Gaussians predicted by the model and analogous Gaussians estimated directly from MD simulations of a training library of sequences. The training library comprises a short list of short palindromic DNA sequences. We designed the palindromic library using an ad hoc algorithm to include multiple instances of all independent tetramer sub-sequences. We exploit palindromic symmetry properties to study the convergence of the statistics extracted from MD simulations of palindromes and to define palindromically symmetrised estimators of first and second centred moments. The computation of the parameter set is delicate and needs the use of sophisticated numerics. We present an efficient and reliable procedure for estimating a complete parameter set which involves a generalisation of the classic Fisher information matrix and its relationship to the relative entropy, or Kullback-Leibler divergence. The model is a computationally efficient tool that allows the study of the mechanical properties of double-stranded DNA of arbitrary length and sequence. We use the model to study the sequence-dependent rigidity of DNA and we compute sequence-dependent apparent and dynamic persistence lengths. The explicit treatment of the phosphate group also allows computation of sequence-dependent grooves widths. Moreover, with fine-grained representation of predicted ground-states, we can also study sequence-dependence of sugar puckering modes and BI-BII backbone conformations.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.