**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Concept# Probabilistic context-free grammar

Résumé

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.
PCFGs extend context-free grammars similar to how hidden Markov models extend regular grammars. Each production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées

Chargement

Personnes associées

Chargement

Unités associées

Chargement

Concepts associés

Chargement

Cours associés

Chargement

Séances de cours associées

Chargement

Personnes associées (3)

Publications associées (12)

Chargement

Chargement

Chargement

Cours associés (2)

The objective of this course is to present the main models, formalisms and algorithms necessary for the development of applications in the field of natural language information processing. The concepts introduced during the lectures will be applied during practical sessions.

The Deep Learning for NLP course provides an overview of neural network based methods applied to text. The focus is on models particularly suited to the properties of human language, such as categorical, unbounded, and structured representations, and very large input and output vocabularies.

Concepts associés (12)

Traitement automatique du langage naturel

Le traitement automatique du langage naturel (TALN), en anglais natural language processing ou NLP, est un domaine multidisciplinaire impliquant la li

Linguistique informatique

La linguistique informatique est un champ interdisciplinaire basé sur une modélisation symbolique (à base de règles) ou statistique du langage naturel établie dans une perspective informatique.

Algorithme de Viterbi

L'algorithme de Viterbi, d'Andrew Viterbi, permet de corriger, dans une certaine mesure, les erreurs survenues lors d'une transmission à travers un canal bruité.
Son utilisation s'appuie sur la c

Unités associées (2)

Séances de cours associées (4)

In this work, we propose different strategies for efficiently integrating an automated speech recognition module in the framework of a dialogue-based vocal system. The aim is the study of different ways leading to the improvement of the quality and robustness of the recognition. We first concentrate on the choice of the type of acoustic models that should be used for the speech recognition. Our goal is to evaluate the hypothesis that hybrid acoustic models, in which estimation of frame-based phoneme probabilities is made through artificial neural networks, provide performance results similar to the "classical" Hidden-Markov models using Multi-Gaussian estimations, while being more robust in generalization across tasks. We experimentally show that, due to the size of the parameter space to be explored, it is not always practically possible to achieve a performance comparable to the one of Multi-Gaussian models, and that in fact hybrid models often lead to worse recognition performance. In a second part, we focus on one of the main limitations of state-of-the-art speech recognition: the inadequacy of the one-best approach to yield a hypothesis corresponding to the right transcription. For that, we explore the solution consisting in producing, during acoustic decoding, a word lattice containing a very large number of hypotheses, that is then filtered by a syntactic analyzer using more sophisticated syntactic models, such as stochastic context-free grammars. The goal of this approach is to yield syntactically correct hypotheses for further processing. More precisely, we study the approach proach consisting in dynamically tuning the relative importance of the acoustic and language models, resulting in the increase of the lexical and syntacticonsisting variability in the word lattice. We identify and experimentally quantify two important drawbacks for this approach: its high computational cost and the impossibility to guarantee that, in practice, the correct solution is indeed present in the lattice. Finally, we study the problem of the inadequacy of the use of generic linguistic resources (language models and phonetic lexica) to yield robust and efficient recognition results. In this context, we explore the solution consisting in the integration of dynamic phonetic and language models controlled by an associated dialogue model. In this approach, restricted lexicon and language models dependent on the context of the dialogue are used in place of the complete ones. We first experimentally verify that this approach indeed yields a significant increase in speech recognition performance, and we then focus on the problem of producing, for a given application, the adequate dialogue model that can efficiently integrate the speech recognition module. In this perspective, we propose an enhancement of the used dialogue model prototyping methodology by integrating speech recognition error simulation within the Wizard-of-Oz dialogue simulation. We show that such an approach enables a more complete prototyping of the dialogue model that guarantees a better adequacy of the resulting dialogue model to the targeted vocal application.

The determination of transcriptional regulatory networks is key to the understanding of biological systems. However, the experimental determination of transcriptional regulatory networks in the laboratory remains difficult and time-consuming, while current computational methods to infer these networks (typically from gene-expression data) achieve only modest accuracy. The latter can be attributed in part to the limitations of a single-organism approach. Computational biology has long used comparative and, more generally, evolutionary approaches to extend the reach and accuracy of its analyses. We therefore use an evolutionary approach to the inference of regulatory networks, which enables us to study evolutionary models for these networks as well as to improve the accuracy of inferred networks. Since the regulatory networks evolve along with the genomes, we consider that the regulatory networks for a family of organisms are related to each other through the same phylogenetic tree. These relationships contain information that can be used to improve the accuracy of inferred networks. Advances in the study of evolution of regulatory networks provide evidence to establish evolutionary models for regulatory networks, which is an important component of our evolutionary approach. We use two network evolutionary models, a basic model that considers only the gains and losses of regulatory connections during evolution, and an extended model that also takes into account the duplications and losses of genes. With the network evolutionary models, we design refinement algorithms to make use of the phylogenetic relationships to refine noisy regulatory networks for a family of organisms. These refinement algorithms include: RefineFast and RefineML, which are two-step iterative algorithms, and ProPhyC and ProPhyCC, which are based on a probabilistic phylogenetic model. For each algorithm we first design it with the basic network evolutionary model and then generalize it to the extended evolutionary model. All these algorithms are computationally efficient and are supported by extensive experimental results showing that they yield substantial improvement in the quality of the input noisy networks. In particular, ProPhyC and ProPhyCC further improve the performance of RefineFast and RefineML. Besides the four refinement algorithms mentioned above, we also design an algorithm based on transfer learning theory called tree transfer learning (TTL). TTL differs from the previous four refinement algorithms in the sense that it takes the gene-expression data for the family of organisms as input, instead of their inferred noisy networks. TTL then learns the network structures for all the organisms at once, meanwhile taking advantage of the phylogenetic relationships. Although this approach outperforms an inference algorithm used alone, it does not perform better than ProPhyC, which indicates that the ProPhyC framework makes good use of the phylogenetic information.

Pascal Fua, Eduard Trulls Fortuny, Michal Jan Tyszkiewicz

Local feature frameworks are difficult to learn in an end-to-end fashion, due to the discreteness inherent to the selection and matching of sparse keypoints. We introduce DISK (DIScrete Keypoints), a novel method that overcomes these obstacles by leveraging principles from Reinforcement Learning (RL), optimizing end-to-end for a high number of correct feature matches. Our simple yet expressive probabilistic model lets us keep the training and inference regimes close, while maintaining good enough convergence properties to reliably train from scratch. Our features can be extracted very densely while remaining discriminative, challenging commonly held assumptions about what constitutes a good keypoint, as showcased in Fig. 1, and deliver state-of-the-art results on three public benchmarks.

2020