MicroRNAs (miRNAs) comprise a large set of short noncoding RNAs that bind to messenger RNAs (mRNAs) to reduce their translation into functional proteins. Computational prediction of miRNA targets is the first stage in the discovery and validation of new regulatory interactions. However, the incomplete complementarity between animal miRNAs and their targets poses a major obstacle in the identification of functional binding sites. Full complementarity is only observed for the six to eight nucleotides located at 5’-end of the miRNA (the so-called seed region), but since this condition is not sufficient to assure gene repression, predictions based on miRNA seed matches alone are known to suffer from high rates of false positives. In this work we devoted our efforts to increase the specificity of miRNA target prediction by using seed match accessibility and conservation to reject non-functional interactions. Due to the limited understanding of the molecular details driving the miRNA-mRNA recognition, our aim was to minimize the explicit modeling of molecular interactions by using rigorous statistical models that compensate for the ignorance of the molecular details. We used a modular approach in which accessibility or conservation could be used individually or combined to take advantage of the information available for a particular query. Accessible binding sites were selected by considering the whole Boltzmann ensemble of secondary structures of the target RNA sequence as predicted by RNA folding algorithms. Because the accuracy in modeling inter- and intramolecular RNA interactions is still limited, the extent of accessibility was not used to rank the predictions as many other algorithms do. Instead, we ranked our predictions according to the extent of over-representation of the accessible motifs. This algorithm showed a remarkable improvement in precision and a significant reduction in the computational cost in comparison with other free-energy-based methods. Further analysis of the accessibility of a large set of validated targets revealed new details about the nucleation of the miRNA-mRNA pairing. We found that accessibility of nucleotides at the 3’-end of the seed match was a much stronger determinant of site functionality than the accessibility of the nucleotides at the 5’-end. This asymmetry could be interpreted as the preference of the miRNA-AGO complex to nucleate the pairing at the 3’-end of the seed match. Motivated by the successful coupling of accessibility as a filter and over-representation as a ranking criterion, our next step was to introduce a more general model in which miRNA binding sites were filtered by conservation, accessibility, or both, while keeping over-representation as the ranking criterion. The advantages of such a flexible approach were demonstrated by predicting targets of highly and weakly conserved miRNAs using different filter configurations. We showed that site conservation is very useful when the miRNAs are highl
Pierre Vandergheynst, Felix Naef, Cédric Gobet, Francesco Craighero, Mohan Vamsi Nallapareddy