Publication

Lausanne Historical Censuses Dataset HTR 35k

Lucas Arnaud André Rappo, Rémi Guillaume Petitpierre, Marion Kramer
2023
Ensemble de données

Résumé

This training dataset includes a total of 34,913 manually transcribed text segments. It is dedicated to the handwritten text recognition (HTR) of historical sources, typically tabular records, such as censuses. This dataset is based on a sample of 83 pages from the 19th century (1805-1898) censuses of Lausanne, Switzerland. The primary language of the documents is French, although many germanic names and toponyms are also found. The training data are formatted and provided on the model of the Bentham dataset. The format thus simply consists in a list of jpeg images, one per text segments, and their corresponding transcription, stored in a txt file. The file naming convention is 'yyyy-ppp-n', where 'y' stands for the year of publication of the census, and 'p' for the page number. The digitized documents are provided by the Archives of the City of Lausanne. Please note that the annotation and extraction methodology, as well as the complete evaluation of performance, including HTR benchmark and post-correction performance is published in : Petitpierre R., Rappo L., Kramer M. (2023). An end-to-end pipeline for historical censuses processing. International Journal on Document Analysis and Recognition (IJDAR). doi: 10.1007/s10032-023-00428-9 Tabular dataset resulting from automatic extraction are also available on Zenodo : Petitpierre R., Rappo L., Kramer M., di Lenardo I. (2023). 1805-1898 Census Records of Lausanne : a Long Digital Dataset for Demographic History. Zenodo. doi: 10.5281/zenodo.7711640

Source officielle

https://infoscience.epfl.ch/record/301983?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Lucas Arnaud André Rappo, Rémi Guillaume Petitpierre, Marion Kramer
2023
Ensemble de données

Résumé

Source officielle

https://infoscience.epfl.ch/record/301983?ln=fr

À propos de ce résultat

Proximité ontologique

Information engineering

Traitement automatique du langage naturel: Traitement automatique du langage naturel

Concepts associés (37)

Publications associées (36)

Lausanne Historical Censuses Dataset HTR 35k

Graph Chatbot

Chattez avec Graph Search

Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

1805-1898 Census Records of Lausanne : a Long Digital Dataset for Demographic History

Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

1805-1898 Census Records of Lausanne : a Long Digital Dataset for Demographic History