Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
Different senses of source words must often be rendered by different words in the target language when performing machine translation (MT). Selecting the correct translation of polysemous words can be done based on the contexts of use. However, state-of-the-art MT algorithms generally work on a sentence-by-sentence basis that ignores information across sentences. In this thesis, we address this problem by studying novel contextual approaches to reduce source word ambiguity in order to improve translation quality. The thesis consists of two parts: the first part is devoted to methods for correcting ambiguous word translations by enforcing consistency across sentences, and the second part investigates sense-aware MT systems that address the ambiguity problem for each word.
In the first part, we propose to reduce word ambiguity by using lexical consistency, starting from the one-sense-per-discourse hypothesis. If a polysemous word appears multiple times in a discourse, it is likely that occurrences will share the same sense. We first improve the translation of polysemous nouns (Y) in the case when a previous occurrence of a noun as the head of a compound noun phrase (XY) is available in a text. Experiments on two language pairs show that the translations of the targeted polysemous nouns are significantly improved.
As compound pairs X Y /Y appear quite infrequently in texts, we extend our work by analysing the repetition of nouns which are not compounds. We propose a method to decide whether two occurrences of the same noun in a source text should be translated consistently. We design a classifier to predict translation consistency based on syntactic and semantic features. We integrate the classifiersâ output into MT. We experiment on two language pairs and show that our method closes up to 50% of the gap in BLEU scores between the baseline and an oracle classifier.
In the second part of the thesis, we design sense-aware MT systems that (automatically) select the correct translations of ambiguous words by performing word sense disambiguation (WSD). We demonstrate that WSD can improve MT by widening the source context considered when modeling the senses of potentially ambiguous words. We first design three adaptive clustering algorithms, respectively based on k-means, Chinese restaurant process and random walk. For phrase-based statistical MT, we integrate the sense knowledge as an additional feature through a factored model and show that the combination improves the translation from English to five other languages.
As the sense integration appears promising for SMT, we also transfer this approach to the newer neural MT models, which are now state of the art. However, unlike SMT, for which it is easier to use linguistic features, NMT uses vectors for word generation and traditional feature incorporation does not work here. We design a sense-aware NMT model that jointly learns the sense knowledge using an attention-based sense selection mechanism and concatenates the learned sense vectors with word vectors during encoding . Such a concatenation outperforms several baselines. The improvements are significant over both overall and analysed ambiguous words over the same language pairs we experiment with SMT.
Overall, the thesis proves that lexical consistency and WSD are practical and workable solutions that lead to global improvements in translation in ranges of 0.2 to 1.5 BLEU score.