Reference-based vs. task-based evaluation of human language technology

This paper starts from the ISO distinction of three types of evaluation procedures – internal, external and in use – and proposes to match these types to the three types of human language technology (HLT) systems: analysis, generation, and interactive. The paper explains why internal evaluation is not suitable to measure the qualities of HLT systems, and shows that reference-based external evaluation is best adapted to ‘analysis’ systems, task-based evaluation to ‘interactive’ systems, while ‘generation’ systems can be subject to both types of evaluation. In particular, some limits of reference-based external evaluation are shown in the case of generation systems. Finally, the paper shows that contextual evaluation, as illustrated by the FEMTI framework for MT evaluation, is an effective method for getting reference-based evaluation closer to the users of a system.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Reference-based vs. task-based evaluation of human language technology

Graph Chatbot

Chattez avec Graph Search

Evaluation and digitalization: what are the key challenges for evaluation processes and evaluators?

Tumescense monitoring system for diagnosing erectile dysfunction and methods of use

Periodic and aperiodic NiFe nanomagnet/ferrimagnet hybrid structures for 2D magnon steering and interferometry with high extinction ratio

Evaluation and digitalization: what are the key challenges for evaluation processes and evaluators?

Tumescense monitoring system for diagnosing erectile dysfunction and methods of use

Periodic and aperiodic NiFe nanomagnet/ferrimagnet hybrid structures for 2D magnon steering and interferometry with high extinction ratio