Concept

Reconnaissance d'entités nommées

La reconnaissance d'entités nommées est une sous-tâche de l'activité d'extraction d'information dans des corpus documentaires. Elle consiste à rechercher des objets textuels (c'est-à-dire un mot, ou un groupe de mots) catégorisables dans des classes telles que noms de personnes, noms d'organisations ou d'entreprises, noms de lieux, quantités, distances, valeurs, dates, etc. À titre d'exemple, on pourrait donner le texte qui suit, étiqueté par un système de reconnaissance d'entités nommées utilisé lors de la campagne d'évaluation MUC: Henri a acheté 300 actions de la société AMD en 2006 Henri a acheté 300 actions de la société AMD en 2006. Le texte de cet exemple est étiqueté avec des balises XML, respectant le standard d'étiquetage ENAMEX. La plupart des systèmes d'étiquetages utilisent des grammaires formelles associées à des modèles statistiques, éventuellement complétées par des bases de données (listes de prénoms, de noms de villes ou de pays par exemple). Dans les grandes campagnes d'évaluation, les systèmes à bases de grammaires rédigées manuellement obtiennent les meilleurs résultats. L'inconvénient est que les systèmes de ce type requièrent parfois des mois de travail de rédaction. Les systèmes statistiques actuels utilisent pour leur part une grande quantité de données pré-annotées pour apprendre les formes possibles des entités nommées. Il n'est plus nécessaire ici de rédiger de nombreuses règles à la main, mais d'étiqueter un corpus qui servira d'outil d'apprentissage. Ces systèmes sont donc eux aussi très coûteux en temps humain. Pour résoudre ce problème, récemment, des initiatives telles que DBpedia ou Yago cherchent à fournir des corpus sémantiques susceptibles d'aider à concevoir des outils d'étiquetage. Dans le même esprit, certaines ontologies sémantiques telles que NLGbAse sont largement orientées vers l'étiquetage. Depuis 1998, l'annotation des entités nommées dans des textes rencontre un intérêt croissant. De nombreuses applications y font appel, pour la recherche d'information ou plus généralement la compréhension de documents textuels.

Source officielle

https://fr.wikipedia.org/wiki/Reconnaissance_d'entités_nommées

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Cours associés (8)

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

DH-405: Foundations of digital humanities

This course gives an introduction to the fundamental concepts and methods of the Digital Humanities, both from a theoretical and applied point of view. The course introduces the Digital Humanities cir

HUM-369: Digital humanities

Les Digital Humanities sont une discipline à la croisée des sciences de l'information et des sciences humaines et sociales. Dans ce cours, les étudiantes et étudiants découvrent ce domaine de recherch

Afficher plus

Publications associées (31)

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Mattia Atzeni

The ability to reason, plan and solve highly abstract problems is a hallmark of human intelligence. Recent advancements in artificial intelligence, propelled by deep neural networks, have revolutionized disciplines like computer vision and natural language ...

EPFL2024

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

Daniel Gatica-Perez

This paper examines how the European press dealt with the no-vax reactions against the Covid-19 vaccine and the dis- and misinformation associated with this movement. Using a curated dataset of 1786 articles from 19 European newspapers on the anti-vaccine ...

ASSOC COMPUTING MACHINERY2023

impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers

Maud Ehrmann, Matteo Romanello

Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often ...

2023

Afficher plus

Unités associées (2)

Laboratoire d'humanités digitales

Laboratoire de systèmes d'information répartis

Source officielle

https://fr.wikipedia.org/wiki/Reconnaissance_d'entités_nommées

À propos de ce résultat

Cours associés (8)

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

DH-405: Foundations of digital humanities

HUM-369: Digital humanities

Afficher plus

Séances de cours associées (32)

Désambigation de l'entité

Explore les techniques de désambigation des entités, y compris les modèles NER, Viterbi et GPT, en mettant l'accent sur la conception rapide et l'apprentissage en contexte.

Web sémantique & Extraction d'information

Explore le Web sémantique, les ontologies, l'extraction de l'information, les phrases clés, les entités nommées et les bases de connaissances.

Extraction d'entités et d'informations

Explore l'extraction de connaissances à partir du texte, couvrant des concepts clés tels que l'extraction de phrases clés et la reconnaissance d'entités nommées.

Afficher plus

Publications associées (31)

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Mattia Atzeni

EPFL2024

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

Daniel Gatica-Perez

ASSOC COMPUTING MACHINERY2023

impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers

Maud Ehrmann, Matteo Romanello

2023

Afficher plus

Personnes associées (4)

Maud Ehrmann

Maud Ehrmann is a research scientist at EPFL’s Digital Humanities Laboratory lead by Prof. Frédéric Kaplan . She holds a PhD in Computational Linguistics from the Paris Diderot Universtiy (Paris 7) and has been engaged in a large number of scientific projects centred on information extraction and text analysis, both for present-time and historical documents. Her main research interests span Natural Language Processing and Digital Humanities and include, among others, historical text annotation, historical data processing and representation, named entity recognition, and multilingual linguistic resources creation. Her current work at the DHLAB focuses on ‘impresso - Media Monitoring of the Past’ , a SNF sinergia project she initiated with Marten Düring ( C2DH ) and Simon Clematide ( ICL ) and which aims at enabling critical analysis of historical newspapers. In addition to the overall project management, her contributions to this project include system design and data management, annotation and benchmarking and named entity processing. Besides, she participates to the activities of the Venice Time Machine , working particularly on information extraction and knowledge representation tasks. Previously, she worked on the Garzoni project where she supervised and contributed to the development of a web-based transcription and annotation interface - in collaboration with Orlin Topalov, and built a linked data-based historical knowledge base. She also contributed to the Le Temps Digital Archives project . Prior to joining the DHLAB, she worked at the Linguistics Computing Laboratory at the Sapienza University of Rome with Roberto Navigli, where she worked on the BabelNet resource - a very large multilingual encyclopaedic dictionary and semantic network - and contributed to the LIDER project. Before that, she has been working for four years at the European Commission’s Joint Research Centre in Ispra, Italy, as member of the OPTIMA unit (now Text and Data mining unit), which develops innovative and application-oriented solutions for retrieving and extracting information from the Internet with a focus on high multilinguality. Together with Erik van der Goot, Ralf Steinberger , Hristo Tanev, Leo della Rocca and many others, she contributed to the development of the Europe Media Monitor (EMM). Prior, she worked at the Xerox Europe Research Centre in Grenoble, France (now Naver Labs Europe ) in the Parsing & Semantics group led by Frédérique Segond, first as PhD candidate supported through a CIFRE grant under the supervision of Caroline Brun and Bernard Victorri , then as a post-doctoral researcher. There her research focused mainly on the automatic processing and fine-grained analysis of entities of interest, specifically named entities and temporal expressions.

Hervé Bourlard

Matteo Romanello

Karl Aberer

Co-Founder of LinkAlong Sarl, 2017.Vice-president EPFL for Information Systems, 2012 –2016.Director of the Swiss National Centre for Mobile Information and Communication Systems NCCR MICS (mics.ch), 2005 -2012.Member of the Swiss Research and Technology Council SWTR, consulting the Swiss Federal government, 2004 - 2011.

Unités associées (2)

Laboratoire d'humanités digitales

Laboratoire de systèmes d'information répartis

Concepts associés (11)

Information extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction Due to the difficulty of the problem, current approaches to IE (as of 2010) focus on narrowly restricted domains.

Annotation sémantique

L'annotation sémantique est l'opération consistant à relier le contenu d'un texte à des entités dans une ontologie. Par exemple, pour la phrase «Paris est la capitale de la France.», l'annotation correcte de Paris serait Paris et non Paris Hilton. L'annotation sémantique est une variante plus détaillée mais moins exacte de la méthode des entitiés nommées, car ces dernières décrivent seulement la catégorie de l'entité (Paris est une ville, sans la relier à la bonne page Wikipédia).

Annotation (informatique)

En programmation, une annotation est un élément permettant d'ajouter des méta-données à un code source. Selon le langage de programmation et ce qu'a choisi le programmeur, elles peuvent être accessibles uniquement lors de la compilation, présentes uniquement dans le fichier compilé, voire accessibles à l'exécution. Cette technique est une alternative aux fichiers de configuration, souvent écrits dans des formats tels que le XML ou le YAML.

Afficher plus

Personnes associées (4)