Publication

Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs

Abstract

Together with critical editions and translations, commentaries are one of the main genres of publication in literary and textual scholarship, and have a century-long tradition. Yet, the exploitation of thousands of digitized historical commentaries was hitherto hindered by the poor quality of Optical Character Recognition (OCR), especially on commentaries to Greek texts. In this paper, we evaluate the performances of two pipelines suitable for the OCR of historical classical commentaries. Our results show that Kraken + Ciaconna reaches a substantially lower character error rate (CER) than Tesseract/OCR-D on commentary sections with high density of polytonic Greek text (average CER 7% vs. 13%), while Tesseract/OCR-D is slightly more accurate than Kraken + Ciaconna on text sections written predominantly in Latin script (average CER 8.2% vs. 8.4%). As part of this paper, we also release GT4HistComment, a small dataset with OCR ground truth for 19th classical commentaries and Pogretra, a large collection of training data and pre-trained models for a wide variety of ancient Greek typefaces.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Ontological neighbourhood
Related concepts (32)
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as well as consonants. In Archaic and early Classical times, the Greek alphabet existed in many local variants, but, by the end of the 4th century BC, the Euclidean alphabet, with 24 letters, ordered from alpha to omega, had become standard and it is this version that is still used for Greek writing today.
Romanization of Greek
Romanization of Greek is the transliteration (letter-mapping) and/or transcription (sound-mapping) of text from the Greek alphabet into the Latin alphabet. The conventions for writing and romanizing Ancient Greek and Modern Greek differ markedly. The sound of the English letter B (/b/) was written as β in ancient Greek but is now written as the digraph μπ, while the modern β sounds like the English letter V (/v/) instead.
Greek diacritics
Greek orthography has used a variety of diacritics starting in the Hellenistic period. The more complex polytonic orthography (πολυτονικό σύστημα γραφής), which includes five diacritics, notates Ancient Greek phonology. The simpler monotonic orthography (μονοτονικό σύστημα γραφής), introduced in 1982, corresponds to Modern Greek phonology, and requires only two diacritics. Polytonic orthography () is the standard system for Ancient Greek and Medieval Greek.
Show more
Related publications (8)

Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs

Matteo Romanello, Sven-Nicolas Yoann Najem

Together with critical editions and translations, commentaries are one of the main genres of publication in literary and textual scholarship, and have a century-long tradition. Yet, the exploitation of thousands of digitized historical commentaries was hit ...
2021

The transferability of handwriting skills: from the Cyrillic to the Latin alphabet

Pierre Dillenbourg, Thibault Lucien Christian Asselborn, Wafa Monia Benkaouar Johal

Do handwriting skills transfer when a child writes in two different scripts, such as the Latin and Cyrillic alphabets? Are our measures of handwriting skills intrinsically bound to one alphabet or will a child who faces handwriting difficulties in one scri ...
2021

Il demone dell'analogia: Ovvero affinità e divergenze fra il compagno aristotele e noi

Nicola Braghieri

This laconic discourse uses the Aristotelian authority to define the role of the analogical procedure in the government of the architectural composition. Respecting its ambiguous balance between mathematical method and attitude of the imagination, analogy ...
2021
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.