Publication

Discourse Phenomena in Machine Translation

Abstract

Machine Translation (MT) has made considerable progress in the past two decades, particularly after the introduction of neural network models (NMT). During this time, the research community has mostly focused on modeling and evaluating MT systems at the sentence level. MT models learn to translate from large amounts of parallel sentences in different languages. The focus on sentences brings a practical simplification for the task that favors efficiency but has the disadvantage of missing relevant contextual information. Several studies showed that the negative impact of this simplification is significant. One key point is that the discourse dependencies among distant words are ignored, resulting in a lack of coherence and cohesion in the text.

The main objective of this thesis is to improve MT by including discourse-level constraints. In particular, we focus on the translation of the entity mentions. We summarize our contributions in four points. First, we define the evaluation process to assess entity translations (i.e., nouns and pronouns) and propose an automatic metric to measure this phenomenon. Second, we perform a proof-of-concept and analyze how effective it is to include entity coreference resolution (CR) in translation. We conclude that CR significantly helps pronoun translation and boosts the whole translation quality according to human judgment. Third, we focus on the discourse connections at the sentence level. We propose enhancing the sequential model to infer long-term connections by incorporating a ‘self-attention’ mechanism. This mechanism gives direct and selective access to the context. Experiments in different language pairs show that our method outperforms various baselines, and the analysis confirms that the model emphasizes a broader context and captures syntactic-like structures. Fourth, we formulate the problem of document-level NMT and model inter-sentential connections among words with a hierarchical attention mechanism. Experiments on multiple data sets show significant improvement over two strong baselines and conclude that the source and target sides’ contexts are mutually complementary. This set of results confirms that discourse significantly enhances translation quality, verifying our main thesis objective.

Our secondary objective is to improve the CR task by modeling the underlying connections among entities at the document-level. This task is particularly challenging for current neural network models because it requires understanding and reasoning. First, we propose a method to detect entity mentions from partially annotated data. We then proposed to model coreference with a graph of entities encoded in a pre-trained language model as an internal structure. The experiments show that these methods outperform various baselines. CR has the potential to help MT and other text generation tasks by maintaining coherence between the entity mentions.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (45)
Machine translation
Machine translation is use of either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. History of machine translation The origins of machine translation can be traced back to the work of Al-Kindi, a ninth-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation.
Neural machine translation
Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, all parts of the neural translation model are trained jointly (end-to-end) to maximize the translation performance.
Named-entity recognition
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp.
Show more
Related publications (97)

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Mattia Atzeni

The ability to reason, plan and solve highly abstract problems is a hallmark of human intelligence. Recent advancements in artificial intelligence, propelled by deep neural networks, have revolutionized disciplines like computer vision and natural language ...
EPFL2024

Transformer Models for Vision

Jean-Baptiste Francis Marie Juliette Cordonnier

The recent developments of deep learning cover a wide variety of tasks such as image classification, text translation, playing go, and folding proteins.All these successful methods depend on a gradient-based learning algorithm to train a model on massive a ...
EPFL2023

Dense Image-based Predictions for Comics Analysis

Deblina Bhattacharjee

Dense image-based prediction methods have advanced tremendously in recent years. Their remarkable development has been possible due to the ample availability of real-world imagery. While these methods work well on photographs, their abilities do not genera ...
EPFL2023
Show more
Related MOOCs (16)
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Neuroscience Reconstructed: Cell Biology
This course will provide the fundamental knowledge in neuroscience required to understand how the brain is organised and how function at multiple scales is integrated to give rise to cognition and beh
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.