**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Word Sequence Modeling using Deep Learning

Résumé

For a long time, natural language processing (NLP) has relied on generative models with task specific and manually engineered features. Recently, there has been a resurgence of interest for neural networks in the machine learning community, obtaining state-of-the-art results in various fields such as computer vision, speech processing and natural language processing. The central idea behind these approaches is to learn features and models simultaneously, in an end-to-end manner, and making as few assumptions as possible. In NLP, word embeddings, mapping words in a dictionary on a continuous low-dimensional vector space, have proven to be very efficient for a large variety of tasks while requiring almost no a-priori linguistic assumptions. In this thesis, we investigate continuous representations of segments in a sentence for the purpose of solving NLP tasks that involve complex sentence-level relationships. Our sequence modelling approach is based on neural networks and takes advantage of word embeddings. A first approach models words in context in the form of continuous vector representations which are used to solve the task of interest. With the use of a compositional procedure, allowing arbitrarily-sized segments to be compressed onto continuous vectors, the model is able to consider long-range word dependencies as well. We first validate our approach on the task of bilingual word alignment, consisting in finding word correspondences between a sentence in two different languages. Source and target words in context are modeled using convolutional neural networks, obtaining representations that are later used to compute alignment scores. An aggregation operation enables unsupervised training for this task. We show that our model outperforms a standard generative model. The model above is extended to tackle phrase prediction tasks where phrases rather than single words are to be tagged. These tasks have been typically cast as classic word tagging problems using special tagging schemes to identify the segments boundaries. The proposed neural model focuses on learning fixed-size representations of arbitrarily-sized chunks of words that are used to solve the tagging task. A compositional operation is introduced in this work for the purpose of computing these representations. We demonstrate the viability of the proposed representations by evaluating the approach on the multiwork expression tagging task. The remainder of this thesis addresses the task of syntactic constituency parsing which, as opposed to the above tasks, aims at producing a structured output, in the form of a tree, of an input sentence. Syntactic parsing is cast as multiple phrase prediction problems that are solved recursively in a greedy manner. An extension using recursive compositional vector representations, allowing for lexical infor- mation to be propagated from early stages, is explored as well. This approach is evaluated on a standard corpus obtaining performance comparable to generative models with much shorter computation time. Finally, morphological tags are included as additional features, using a similar composition procedure, to improve the parsing performance for morphologically rich languages. State-of-the-art results were obtained for these task and languages.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (25)

Neural network

A neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neur

Apprentissage profond

L'apprentissage profond ou apprentissage en profondeur (en anglais : deep learning, deep structured learning, hierarchical learning) est un sous-domaine de l’intelligence artificiel

Réseau neuronal convolutif

En apprentissage automatique, un réseau de neurones convolutifs ou réseau de neurones à convolution (en anglais CNN ou ConvNet pour convolutional neural networks) est un type de réseau de neurones a

Publications associées (103)

Chargement

Chargement

Chargement

Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and one scenario in efficiently estimating an empirical expectation. We present novel algorithmic solutions and demonstrate their applications on a wide range of data-sets.
The first scenario deals with learning from small data with large number of classes. This setting is common in computer vision problems such as person re-identification and face verification. To address this problem we present a new algorithm called Weighted Approximate Rank Component Analysis (WARCA), which is scalable, robust, non-linear and is independent of the number of classes. We empirically demonstrate the performance of our algorithm on 9 standard person re-identification data-sets where we obtain state of the art performance in terms of accuracy as well as computational speed.
The second scenario we consider is learning embeddings from sequences. When it comes to learning from sequences, recurrent neural networks have proved to be an effective algorithm. However there are many problems with existing recurrent neural networks which makes them data hungry (high sample complexity) and difficult to train. We present a new recurrent neural network called Kronecker Recurrent Units (KRU), which addresses the issues of existing recurrent neural networks through Kronecker matrices. We show its performance on 7 applications, ranging from problems in computer vision, language modeling, music modeling and speech recognition.
Most of the machine learning algorithms are formulated as minimizing an empirical expectation over a finite collection of samples. In this thesis we also investigate the problem of efficiently estimating a weighted average over large data-sets. We present a new data-structure called Importance Sampling Tree (IST), which permits fast estimation of weighted average without looking at all the samples. We show successfully the evaluation of our data-structure in the training of neural networks in order to efficiently find informative samples.

Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and one scenario in efficiently estimating an empirical expectation. We present novel algorithmic solutions and demonstrate their applications on a wide range of data-sets. The first scenario deals with learning from small data with large number of classes. This setting is common in computer vision problems such as person re-identification and face verification. To address this problem we present a new algorithm called Weighted Approximate Rank Component Analysis (WARCA), which is scalable, robust, non-linear and is independent of the number of classes. We empirically demonstrate the performance of our algorithm on 9 standard person re-identification data-sets where we obtain state of the art performance in terms of accuracy as well as computational speed. The second scenario we consider is learning embeddings from sequences. When it comes to learning from sequences, recurrent neural networks have proved to be an effective algorithm. However there are many problems with existing recurrent neural networks which makes them data hungry (high sample complexity) and difficult to train. We present a new recurrent neural network called Kronecker Recurrent Units (KRU), which addresses the issues of existing recurrent neural networks through Kronecker matrices. We show its performance on 7 applications, ranging from problems in computer vision, language modeling, music modeling and speech recognition. Most of the machine learning algorithms are formulated as minimizing an empirical expectation over a finite collection of samples. In this thesis we also investigate the problem of efficiently estimating a weighted average over large data-sets. We present a new data-structure called Importance Sampling Tree (IST), which permits fast estimation of weighted average without looking at all the samples. We show successfully the evaluation of our data-structure in the training of neural networks in order to efficiently find informative samples.

,

A Language Model (LM) is a helpful component of a variety of Natural Language Processing (NLP) systems today. For speech recognition, machine translation, information retrieval, word sense disambiguation etc., the contribution of an LM is to provide features and indications on the probability of word sequences, their grammaticality and semantical meaningfulness. What makes language modeling a challenge for Machine Learning algorithms is the sheer amount of possible word sequences: the curse of dimensionality is especially encountered when modeling natural language. The survey will summarize and group literature that has addressed this problem and we will examine promising recent research on Neural Network techniques applied to language modeling in order to overcome the mentioned curse and to achieve better generalizations over word sequences.