Kamusi Pre:D – Lexicon-based source-side predisambiguation for MT and other text processing applications

Kamusi has been developing a system to analyze texts on the source side and present users with sense-specified dictionary options. Similarly to spellcheck, the user selects the intended meaning. We then use a multilingual lexical database to bridge to matching vocabulary in other languages. When paired with Freeling, additional pre-processing is possible for several languages. Integration with MT via Moses and Apertium is planned, but not yet undertaken. MWEs treatment is important. An MWE is lexicalized in the Kamusi database and marked for separability, with a definition and translation equivalents (one or more words) in other languages. When the initial term of an MWE appears in the source text, Pre:D queries the database and scans the sentence for all MWEs that could follow. The user can select the relevant MWE rather than the component words. A user can submit a missing sense or MWE for inclusion in the lexicon. Named entities can also be identified from data sources or by users and rendered appropriately across languages. When users agree, we will also use sense-tagged sentences for machine learning. A prototype of the core system is already functional.

Kamusi Pre:D – Lexicon-based source-side predisambiguation for MT and other text processing applications

Graph Chatbot

Chat with Graph Search

Long plasma duration operation analyses with an international multi-machine (tokamaks and stellarators) database

The Current State of the OBI DICT Project: A Bilingual e-Dictionary of Oracle-Bone Inscriptions with AI Image Recognition

GELEX: Generative AI-Hybrid System for Example-Based Learning

The Current State of the OBI DICT Project: A Bilingual e-Dictionary of Oracle-Bone Inscriptions with AI Image Recognition

Long plasma duration operation analyses with an international multi-machine (tokamaks and stellarators) database

GELEX: Generative AI-Hybrid System for Example-Based Learning