Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing text in complex images and videos. Text localization is performed in a two step approach that combines the speed of a focusing step with the strength of a machine learning based text verification step. The experiments conducted show that the support vector machine is more appropriate for the verification task than the more commonly used neural networks. To perform text recognition on the localized regions, we propose a new multi-hypotheses method. Assuming different models of the text image, several segmentation hypotheses are produced. They are processed by an optical character recognition (OCR) system, and the result is selected from the generated strings according to a confidence value computed using language modeling and OCR statistics. Experiments show that this approach leads to much better results than the conventional method that tries to improve the individual segmentation algorithm. The whole system has been tested on several hours of videos and showed good performance when integrated in a sports video annotation system and a video indexing system within the framework of two European projects.
Frédéric Kaplan, Maud Ehrmann, Matteo Romanello, Sven-Nicolas Yoann Najem, Emanuela Boros
Mathias Josef Payer, Zhiyao Feng, Chunmin Zhang, Ji Shi
Lucas Arnaud André Rappo, Rémi Guillaume Petitpierre, Marion Kramer