BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions (called glosses) in many languages harvested from both WordNet and Wikipedia.
BabelNet (version 5.0) covers 500 languages. It contains almost 20 million synsets and around 1.4 billion word senses (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet (hypernymy and hyponymy, meronymy and holonymy, antonymy and synonymy, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.3 billion edges). Version 5.0 also associates around 51 million images with Babel synsets and provides a Lemon RDF encoding of the resource, available via a SPARQL endpoint. 2.67 million synsets are assigned domain labels.
BabelNet has been shown to enable multilingual Natural Language Processing applications.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.
In natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris".
DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. In 2008, Tim Berners-Lee described DBpedia as one of the most famous parts of the decentralized Linked Data effort.
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
Recognition and identification of real-world entities is at the core of virtually any text mining application. As a matter of fact, referential units such as names of persons, locations and organizations underlie the semantics of texts and guide their inte ...
Moroccan Darija is a variant of Arabic with many influences. Using the Open Multilingual WordNet (OMW), we compare the lemmas in the Moroccan Darija Wordnet (MDW) with the standard Arabic, French and Spanish ones. We then compared the lemmas in each synset ...
We propose methods to link automatically parsed linguistic data to the WordNet. We apply these methods on a trilingual dictionary in Fula, English and French. Dictionary entry parsing is used to collect the linguistic data. Then we connect it to the Open M ...