A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla (Greek for "sixfold") placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.
Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic research.
During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.
Parallel texts may be used in language education.
Parallel corpora can be classified into four main categories:
A parallel corpus contains translations of the same document in two or more languages, aligned at least at the sentence level. These tend to be rarer than less-comparable corpora.
A noisy parallel corpus contains bilingual sentences that are not perfectly aligned or have poor quality translations. Nevertheless, most of its contents are bilingual translations of a specific document.
A comparable corpus is built from non-sentence-aligned and untranslated bilingual documents, but the documents are topic-aligned.
A quasi-comparable corpus includes very heterogeneous and non-parallel bilingual documents that may or may not be topic-aligned.
Large corpora used as training sets for machine translation algorithms are usually extracted from large bodies of similar sources, such as databases of news articles written in the first and second languages describing similar events.
However, extracted fragments may be noisy, with extra elements inserted in each corpus.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
This is a methodological PhD course focused on the history and description of one case study (building, drawing or projects) and the construction of its historical broader context.
D'une part, le cours aborde: (1) la notion d'algorithme et de représentation de l'information, (2) l'échantillonnage d'un signal et la compression de données et (3) des aspects
liés aux systèmes: ordi
This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.
Statistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory.
Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, as well as an API that helps developers build browser extensions and software applications. As of 2022, Google Translate supports languages at various levels; it claimed over 500 million total users , with more than 100 billion words translated daily, after the company stated in May 2013 that it served over 200 million people daily.
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between translating (a written text) and interpreting (oral or signed communication between users of different languages); under this distinction, translation can begin only after the appearance of writing within a language community.
Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often ...
2023
The recent developments of deep learning cover a wide variety of tasks such as image classification, text translation, playing go, and folding proteins.All these successful methods depend on a gradient-based learning algorithm to train a model on massive a ...
Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in di ...