Lecture

Impresso: Newspaper Archives Exploration

In course

Les Digital Humanities sont une discipline à la croisée des sciences de l'information et des sciences humaines et sociales. Dans ce cours, les étudiantes et étudiants découvrent ce domaine de recherch

Description

This lecture delves into the exploration of historical newspaper archives using the Impresso project, focusing on OCR quality assessment, named entity processing, topic modeling, and text reuse analysis. It covers challenges in data preprocessing, search based on named entities, and the evaluation of digital interfaces for historical research.

Instructors (3)

Maud Ehrmann

Maud Ehrmann is a research scientist at EPFL’s Digital Humanities Laboratory lead by Prof. Frédéric Kaplan . She holds a PhD in Computational Linguistics from the Paris Diderot Universtiy (Paris 7) and has been engaged in a large number of scientific projects centred on information extraction and text analysis, both for present-time and historical documents. Her main research interests span Natural Language Processing and Digital Humanities and include, among others, historical text annotation, historical data processing and representation, named entity recognition, and multilingual linguistic resources creation. Her current work at the DHLAB focuses on ‘impresso - Media Monitoring of the Past’ , a SNF sinergia project she initiated with Marten Düring ( C2DH ) and Simon Clematide ( ICL ) and which aims at enabling critical analysis of historical newspapers. In addition to the overall project management, her contributions to this project include system design and data management, annotation and benchmarking and named entity processing. Besides, she participates to the activities of the Venice Time Machine , working particularly on information extraction and knowledge representation tasks. Previously, she worked on the Garzoni project where she supervised and contributed to the development of a web-based transcription and annotation interface - in collaboration with Orlin Topalov, and built a linked data-based historical knowledge base. She also contributed to the Le Temps Digital Archives project . Prior to joining the DHLAB, she worked at the Linguistics Computing Laboratory at the Sapienza University of Rome with Roberto Navigli, where she worked on the BabelNet resource - a very large multilingual encyclopaedic dictionary and semantic network - and contributed to the LIDER project. Before that, she has been working for four years at the European Commission’s Joint Research Centre in Ispra, Italy, as member of the OPTIMA unit (now Text and Data mining unit), which develops innovative and application-oriented solutions for retrieving and extracting information from the Internet with a focus on high multilinguality. Together with Erik van der Goot, Ralf Steinberger , Hristo Tanev, Leo della Rocca and many others, she contributed to the development of the Europe Media Monitor (EMM). Prior, she worked at the Xerox Europe Research Centre in Grenoble, France (now Naver Labs Europe ) in the Parsing & Semantics group led by Frédérique Segond, first as PhD candidate supported through a CIFRE grant under the supervision of Caroline Brun and Bernard Victorri , then as a post-doctoral researcher. There her research focused mainly on the automatic processing and fine-grained analysis of entities of interest, specifically named entities and temporal expressions.

Official source

Related lectures (32)

Digital Humanities and Digitized Press

Explores digitized press archives, text data processing challenges, methodologies for group work, and the societal impact of digital humanities.

Digital History and Digitized Press

Delves into the 'digital turn' in history, examining historical research using digitized newspapers and exploring text reuse, word embeddings, and data visualization.

Urban Digital History: Lausanne Time Machine

Emphasizes the importance of incompatibilities to prevent conflicts of interest and maintain democracy, advocating for a constitutional law on the matter.

Entity Disambiguation

Explores entity disambiguation techniques, including NER, Viterbi algorithm, and GPT models, emphasizing prompt design and in-context learning.

Coreference Resolution: Models and Evaluation

Explores coreference resolution models, challenges in scoring spans, graph refinement techniques, state-of-the-art results, and the impact of pretrained Transformers.

Isabella Di Lenardo

Ph.D. in Theories and Art History. Isabella di Lenardo is senior scientist in Digital Humanities and Urban History. Her research activity is focused on digital tools and methods applied to Urban History. She is an expert in ancient cartography, city representations, cadastral sources interpreted through digital modeling, extraction and analysis systems.She's also interested in network analysis questioning the production and circulation of artistic and architectural knowledge in Europe XVIth – XVIIIth centuries in particular on North-South relationships and influences. The development of collaboration projects with European institutions and the activity carried out on various initiatives supported by the European Commission have allowed her to acquire specialism in the field of science applied to heritage, developing in particular specific skills for the enhancement of Cultural Heritage through digital tools. She leads projects in collaboration with Bibliothèque national de France (Paris), Institut National d’Histoire de l’Art (Paris), Ecole nationale des chartes (Paris), Université Paris I Panthéon-Sorbonne, Centre Allemand d’Histoire de l’Art (Paris), Musée du Louvre, Archives Nationales de Paris, Bibliothèque Historique de la Ville de Paris, Réunion des Musées Nationaux. She supervised all the urban modeling simulations for the Venice Time Machine project and acted as content curator for all the exhibitions (Venice Biennale, Grand Palais-Paris, Datasquare - ArtLab-Lausanne, Museo Correr-Venice). She was Head of “Replica” project : digitizing 1 million photos of artworks in the Fondazione Giorgio Cini (Venice) and visual patterns extraction through a search engine for visual similarities. She has a Ph.D in Art History and also studied Archaeology and Urban Studies. She has published essays and articles about Venetian Art and Urban History and participated in many Art exhibitions in European museums. She coordinated summer schools and workshops about Digital Tools for Art History and Urban History. She attended conferences on digital methods for History in universities and patrimonial institutions.