Language resource management Lexical markup framework (LMF; ISO 24613:2008), is the International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication.
The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources.
Types of individual instantiations of LMF can include monolingual, bilingual or multilingual lexical resources. The same specifications are to be used for both small and large lexicons, for both simple and complex lexicons, for both written and spoken lexical representations. The descriptions range from morphology, syntax, computational semantics to computer-assisted translation. The covered languages are not restricted to European languages but cover all natural languages. The range of targeted NLP applications is not restricted. LMF is able to represent most lexicons, including WordNet, EDR and PAROLE lexicons.
In the past, lexicon standardization has been studied and developed by a series of projects like GENELEX, EDR, EAGLES, MULTEXT, PAROLE, SIMPLE and ISLE. Then, the ISO/TC37 National delegations decided to address standards dedicated to NLP and lexicon representation. The work on LMF started in Summer 2003 by a new work item proposal issued by the US delegation. In Fall 2003, the French delegation issued a technical proposition for a data model dedicated to NLP lexicons. In early 2004, the ISO/TC37 committee decided to form a common ISO project with Nicoletta Calzolari (CNR-ILC Italy) as convenor and Gil Francopoulo (Tagmatica France) and Monte George (ANSI USA) as editors.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Machine-readable dictionary (MRD) is a dictionary stored as machine-readable data instead of being printed on paper. It is an electronic dictionary and lexical database. A machine-readable dictionary is a dictionary in an electronic form that can be loaded in a database and can be queried via application software. It may be a single language explanatory dictionary or a multi-language dictionary to support translations between two or more languages or a combination of both.
In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database. Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an ISO standard for encoding lexical resources, comprising an abstract data model and an XML serialization, and OntoLex-Lemon, an RDF vocabulary for publishing lexical resources as knowledge graphs on the web, e.
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications.
Natural language processing is ubiquitous in modern intelligent technologies, serving as a foundation for language translators, virtual assistants, search engines, and many more. In this course, stude
We propose methods to link automatically parsed linguistic data to the WordNet. We apply these methods on a trilingual dictionary in Fula, English and French. Dictionary entry parsing is used to collect the linguistic data. Then we connect it to the Open M ...
Moroccan Darija is a variant of Arabic with many influences. Using the Open Multilingual WordNet (OMW), we compare the lemmas in the Moroccan Darija Wordnet (MDW) with the standard Arabic, French and Spanish ones. We then compared the lemmas in each synset ...
2018
State-of-the-art automatic speech recognition (ASR) and text-to-speech systems require a pronunciation lexicon that maps each word to a sequence of phones. Manual development of lexicons is costly as it needs linguistic knowledge and human expertise. To fa ...