Publication

Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation

Andrei Popescu-Belis, Thomas Meyer
2011
Conference paper
Abstract

Many discourse connectives can signal several types of relations between sentences. Their automatic disambiguation, i.e. the labeling of the correct sense of each occurrence is important for discourse parsing, but could also be helpful to machine translation. We describe new approaches for improving the accuracy of manual annotation of three discourse connectives (two English, one French) by using parallel corpora. An appropriate set of labels for each connective can be found using information from their translations. Our results for automatic disambiguation are state-of-the-art, at up to 85% accuracy using surface features. Using feature analysis, contextual features are shown to be useful across languages and connectives.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (30)
Word-sense disambiguation
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
Text corpus
In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In search technology, a corpus is the collection of documents which is being searched.
Machine translation
Machine translation is use of either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. History of machine translation The origins of machine translation can be traced back to the work of Al-Kindi, a ninth-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation.
Show more
Related publications (37)

Discourse Phenomena in Machine Translation

Lesly Sadiht Miculicich Werlen

Machine Translation (MT) has made considerable progress in the past two decades, particularly after the introduction of neural network models (NMT). During this time, the research community has mostly focused on modeling and evaluating MT systems at the se ...
EPFL2021

Subword Mapping and Anchoring across Languages

Andrei Popescu-Belis

State-of-the-art multilingual systems rely on shared vocabularies that sufficiently cover all considered languages. To this end, a simple and frequently used approach makes use of subword vocabularies constructed jointly over several languages. We hypothes ...
Assoc Computational Linguistics-Acl2021

OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation

Petr Motlicek

The preparation of parallel corpora is a challenging task, particularly for languages that suffer from under-representation in the digital world. In a multi-lingual country like India, the need for such parallel corpora is stringent for several low-resourc ...
European Language Resources Association (ELRA)2020
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.