How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on c2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to nontranslated ones, due to a universal tendency for explicitation.

How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives

Graph Chatbot

Chat with Graph Search

Stability of the Faber-Krahn inequality for the short-time Fourier transform

Extensions of Peer Prediction Incentive Mechanisms

Thermal Conditions in Indoor Environments: Exploring the Reasoning behind Standard-Based Recommendations

Stability of the Faber-Krahn inequality for the short-time Fourier transform

Extensions of Peer Prediction Incentive Mechanisms

Thermal Conditions in Indoor Environments: Exploring the Reasoning behind Standard-Based Recommendations