Machine translationMachine translation is use of either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. History of machine translation The origins of machine translation can be traced back to the work of Al-Kindi, a ninth-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation.
TranslationTranslation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between translating (a written text) and interpreting (oral or signed communication between users of different languages); under this distinction, translation can begin only after the appearance of writing within a language community.
English languageEnglish is a West Germanic language in the Indo-European language family. It originated in early medieval England and, today, is the most spoken language in the world and the third most spoken native language, after Mandarin Chinese and Spanish. English is the most widely learned second language and is either the official language or one of the official languages in 59 sovereign states. There are more people who have learned English as a second language than there are native speakers.
Europarl CorpusThe Europarl Corpus is a corpus (set of documents) that consists of the proceedings of the European Parliament from 1996 to 2012. In its first release in 2001, it covered eleven official languages of the European Union (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, and Swedish). With the political expansion of the EU the official languages of the ten new member states have been added to the corpus data.
VerbA verb () is a word (part of speech) that in syntax generally conveys an action (bring, read, walk, run, learn), an occurrence (happen, become), or a state of being (be, exist, stand). In the usual description of English, the basic form, with or without the particle to, is the infinitive. In many languages, verbs are inflected (modified in form) to encode tense, aspect, mood, and voice. A verb may also agree with the person, gender or number of some of its arguments, such as its subject, or object.
Neural machine translationNeural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, all parts of the neural translation model are trained jointly (end-to-end) to maximize the translation performance.
Grammatical tenseIn grammar, tense is a that expresses time reference. Tenses are usually manifested by the use of specific forms of verbs, particularly in their conjugation patterns. The main tenses found in many languages include the past, present, and future. Some languages have only two distinct tenses, such as past and nonpast, or future and nonfuture. There are also tenseless languages, like most of the Chinese languages, though they can possess a future and nonfuture system typical of Sino-Tibetan languages.
Statistical machine translationStatistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory.
Verb phraseIn linguistics, a verb phrase (VP) is a syntactic unit composed of a verb and its arguments except the subject of an independent clause or coordinate clause. Thus, in the sentence A fat man quickly put the money into the box, the words quickly put the money into the box constitute a verb phrase; it consists of the verb put and its arguments, but not the subject a fat man. A verb phrase is similar to what is considered a predicate in traditional grammars.
Nominative–accusative alignmentIn linguistic typology, nominative–accusative alignment is a type of morphosyntactic alignment in which subjects of intransitive verbs are treated like subjects of transitive verbs, and are distinguished from objects of transitive verbs in basic clause constructions. Nominative–accusative alignment can be coded by case-marking, verb agreement and/or word order. It has a wide global distribution and is the most common alignment system among the world's languages (including English).