Fouille de textes | EPFL Graph Search

Cours associés (28)

This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

CS-431: Introduction to natural language processing

The objective of this course is to present the main models, formalisms and algorithms necessary for the development of applications in the field of natural language information processing. The concept

Afficher plus

Séances de cours associées (32)

Analyse des données textuelles: réduction de la classification et de la dimensionnalité

Explore la classification des données textuelles, en se concentrant sur des méthodes telles que les bayes naïques et les techniques de réduction de la dimensionnalité telles que l'analyse des composantes principales.

Laboratoire des humanités numériques : Pratiques

Initie les étudiants aux humanités numériques grâce à des exercices pratiques, tels que travailler sur un grand projet en cours.

Extraction d'information: Méthodes et applications

Explore les méthodes d'extraction de l'information, y compris les approches traditionnelles et fondées sur l'intégration, l'apprentissage supervisé, la surveillance à distance et l'induction taxonomique.

Afficher plus

Publications associées (29)

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

Frédéric Kaplan, Maud Ehrmann, Matteo Romanello, Emanuela Boros, Sven-Nicolas Yoann Najem

The quality of automatic transcription of heritage documents, whether from printed, manuscripts or audio sources, has a decisive impact on the ability to search and process historical texts. Although significant progress has been made in text recognition ( ...

Association for Computational Linguistics2024

From scattered sources to comprehensive technology landscape : A recommendation-based retrieval approach

Karl Aberer, Chi Thang Duong

Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and ...

ELSEVIER2023

An Ordinal Latent Variable Model of Conflict Intensity

Robert West

Measuring the intensity of events is crucial for monitoring and tracking armed conflict. Advances in automated event extraction have yielded massive data sets of '' who did what to whom '' micro-records that enable datadriven approaches to monitoring confl ...

Assoc Computational Linguistics-Acl2023

Afficher plus

Personnes associées (3)

Karl Aberer

Co-Founder of LinkAlong Sarl, 2017.Vice-president EPFL for Information Systems, 2012 –2016.Director of the Swiss National Centre for Mobile Information and Communication Systems NCCR MICS (mics.ch), 2005 -2012.Member of the Swiss Research and Technology Council SWTR, consulting the Swiss Federal government, 2004 - 2011.

Samy Bengio

Afficher plus

Unités associées (4)

Laboratoire de systèmes d'information répartis

Laboratoire de l'IDIAP

Laboratoire d'humanités digitales

Afficher plus

Concepts associés (23)

Information extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction Due to the difficulty of the problem, current approaches to IE (as of 2010) focus on narrowly restricted domains.

SPSS

SPSS (Statistical Package for the Social Sciences) est un logiciel utilisé pour l'analyse statistique. C'est aussi le nom de la société qui le revend (SPSS Inc). En 2009, la compagnie décide de changer le nom de ses produits en PASW, pour Predictive Analytics Software et est rachetée par IBM pour 1,24 milliard de dollars. La première version de SPSS a été mise en vente en 1968 et fait partie des programmes utilisés pour l'analyse statistique en sciences sociales.

Reconnaissance d'entités nommées

La reconnaissance d'entités nommées est une sous-tâche de l'activité d'extraction d'information dans des corpus documentaires. Elle consiste à rechercher des objets textuels (c'est-à-dire un mot, ou un groupe de mots) catégorisables dans des classes telles que noms de personnes, noms d'organisations ou d'entreprises, noms de lieux, quantités, distances, valeurs, dates, etc. À titre d'exemple, on pourrait donner le texte qui suit, étiqueté par un système de reconnaissance d'entités nommées utilisé lors de la campagne d'évaluation MUC: Henri a acheté 300 actions de la société AMD en 2006 Henri a acheté 300 actions de la société AMD en 2006.

Afficher plus