Various methods for the evaluation for machine translation have been employed. This article focuses on the evaluation of the output of machine translation, rather than on performance or usability evaluation.
Round-trip translation
A typical way for lay people to assess machine translation quality is to translate from a source language to a target language and back to the source language with the same engine. Though intuitively this may seem like a good method of evaluation, it has been shown that round-trip translation is a "poor predictor of quality". The reason why it is such a poor predictor of quality is reasonably intuitive. A round-trip translation is not testing one system, but two systems: the language pair of the engine for translating into the target language, and the language pair translating back from the target language.
Consider the following examples of round-trip translation performed from English to Italian and Portuguese from Somers (2005):
{|
!Original text
| Select this link to look at our home page.
|-
!Translated
| Selezioni questo collegamento per guardare il nostro Home Page.
|-
!Translated back
| Selections this connection in order to watch our Home Page.
|}
{|
!Original text
| Tit for tat
|-
!Translated
| Melharuco para o tat
|-
!Translated back
| Tit for tat
|}
In the first example, where the text is translated into Italian then back into English—the English text is significantly garbled, but the Italian is a serviceable translation. In the second example, the text translated back into English is perfect, but the Portuguese translation is meaningless; the program thought "tit" was a reference to a tit (bird), which was intended for a "tat", a word it did not understand.
While round-trip translation may be useful to generate a "surplus of fun," the methodology is deficient for serious study of machine translation quality.
This section covers two of the large scale evaluation studies that have had significant impact on the field—the ALPAC 1966 study and the ARPA study.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Explore l'évaluation des modèles de génération de langage naturel, en soulignant l'importance des jugements humains et les limites des paramètres de contenu se chevauchent.
Natural language processing is ubiquitous in modern intelligent technologies, serving as a foundation for language translators, virtual assistants, search engines, and many more. In this course, stude
La traduction automatique désigne la traduction brute d'un texte entièrement réalisée par un ou plusieurs programmes informatiques. Dans le cas de la traduction d'une conversation audio, en direct ou en différé, on parle de transcription automatique. Un traducteur humain n’intervient pas pour corriger les erreurs du texte durant la traduction, mais seulement avant et/ou après. On la distingue de la traduction assistée par ordinateur où la traduction est en partie manuelle, éventuellement de façon interactive avec la machine.
Dense image-based prediction methods have advanced tremendously in recent years. Their remarkable development has been possible due to the ample availability of real-world imagery. While these methods work well on photographs, their abilities do not genera ...
EPFL2023
Atypical aspects in speech concern speech that deviates from what is commonly considered normal or healthy. In this thesis, we propose novel methods for detection and analysis of these aspects, e.g. to monitor the temporary state of a speaker, diseases tha ...
EPFL2023
State-of-the-art multilingual systems rely on shared vocabularies that sufficiently cover all considered languages. To this end, a simple and frequently used approach makes use of subword vocabularies constructed jointly over several languages. We hypothes ...