Regularization Techniques for Low-Resource Machine Translation

Neural machine translation (MT) and text generation have recently reached very high levels of quality. However, both areas share a problem: in order to reach these levels, they require massive amounts of data. When this is not present, they lack generalization abilities. This is the main problem we address in our thesis: how can we increase the generalization abilities of these models when they are trained in low-resource settings? We propose various regularization techniques to address this problem.In Part I of the thesis we study the impact of training the weights of a model so that they set in flatter regions of the parameter space, by indirectly guiding them with a more regularized training regime than would be normal in better-resourced settings. We pursue an empirical approach to this.Firstly, without directly measuring the landscape of the loss function in the parameter space, we show that in low-resource settings NMT systems benefit from more aggressive regularization, which can be achieved by modifying several hyper-parameters, and show that a combination of these factors improves scores more than any single factor. Our explanation is that a less precise optimizer -- due to increased regularization -- is more likely to fall into flatter regions of the loss landscape, leading to more robust systems. We test this hypothesis on a series of low-resource datasets and observe an improvement of quality of 3--6 BLEU points.Secondly, we propose a cost-effective method to directly estimate the flatness of the neighborhood of a point in the parameter space (a model checkpoint) using random perturbations and interpolation. We propose several metrics and compare them.Thirdly, we propose a method to directly train a system into flatter regions by looking ahead for variations of the loss function before performing gradient descent.In Part II we show that the use of auxiliary and synthetic data for neural MT, which is another way to perform regularization, also improves quality in low-resource settings. Firstly, we simplify a state-of-the-art complex pipeline for low-resource translation with no loss in performance.Secondly, we propose a fixed-schedule multitask training regime, with improvements of 1--3 BLEU points.Thirdly, we demonstrate the design of a novel self-paced learning algorithm that balances languages on a multilingual many-to-one regime, by measuring model weight variation throughout training.Fourthly, we show that many-to-many systems improve over well-optimized unidirectional systems.In Part III we present an approach to text generation for a low-resource domain, poetry. Firstly, we train a LM and design a rule-based algorithm that generates various structures and rhymes based on user's specifications.Secondly, we show that synthetic poetry generated by this system helps to fine-tune a LM so that it learns to generate appropriate poems without rule-based constraints. In other words, the use of synthetic data helps to generalize a LM trained on a small in-domain data set.Overall, this thesis offers a unified perspective that improves our understanding of the following regularization techniques: hyper-parameter tuning, loss flatness measuring, multitasking, and usage of auxiliary data. The thesis also shows that these are effective and efficient strategies for improving low-resource neural machine translation and text generation.

Regularization Techniques for Low-Resource Machine Translation

Graph Chatbot

Chattez avec Graph Search

Efficient local linearity regularization to overcome catastrophic overfitting

Understanding generalization and robustness in modern deep learning

On the Generalization of Stochastic Gradient Descent with Momentum

Understanding generalization and robustness in modern deep learning

Efficient local linearity regularization to overcome catastrophic overfitting

On the Generalization of Stochastic Gradient Descent with Momentum