Publications related to Text Detection and Recognition in Images and Videos

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

The ability to reason, plan and solve highly abstract problems is a hallmark of human intelligence. Recent advancements in artificial intelligence, propelled by deep neural networks, have revolutionized disciplines like computer vision and natural language ...

EPFL2024

Aggregating Spatial and Photometric Context for Photometric Stereo

David Honzátko

Photometric stereo, a computer vision technique for estimating the 3D shape of objects through images captured under varying illumination conditions, has been a topic of research for nearly four decades. In its general formulation, photometric stereo is an ...

EPFL2024

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Alexander Carl Unnervik

The field of biometrics, and especially face recognition, has seen a wide-spread adoption the last few years, from access control on personal devices such as phones and laptops, to automated border controls such as in airports. The stakes are increasingly ...

EPFL2024

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

Frédéric Kaplan, Maud Ehrmann, Matteo Romanello, Emanuela Boros, Sven-Nicolas Yoann Najem

The quality of automatic transcription of heritage documents, whether from printed, manuscripts or audio sources, has a decisive impact on the ability to search and process historical texts. Although significant progress has been made in text recognition ( ...

The Association for Computational Linguistics2024

Driving and suppressing the human language network using large language models

Martin Schrimpf

Transformer models such as GPT generate human-like language and are predictive of human brain responses to language. Here, using functional-MRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict t ...

Berlin2024

Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation

Evann Pierre Guy Courdier

Deep learning has revolutionized the field of computer vision, a success largely attributable to the growing size of models, datasets, and computational power.Simultaneously, a critical pain point arises as several computer vision applications are deployed ...

EPFL2024

Explainable Fault Diagnosis of Oil-Immersed Transformers: A Glass-Box Model

Wenlong Liao, Yi Zhang, Zhe Yang

Recently, remarkable progress has been made in the application of machine learning (ML) techniques (e.g., neural networks) to transformer fault diagnosis. However, the diagnostic processes employed by these techniques often suffer from a lack of interpreta ...

Piscataway2024

Incorporating Projective Geometry into Deep Learning

Michal Jan Tyszkiewicz

In this thesis we explore the applications of projective geometry, a mathematical theory of the relation between 3D scenes and their 2D images, in modern learning-based computer vision systems. This is an interesting research question which contradicts the ...

EPFL2024

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Sabine Süsstrunk, Mathieu Salzmann, Tong Zhang, Yi Wu

In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can creat ...

2024

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Scott William Pesme

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of

2

-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...

EPFL2024

Text Detection and Recognition in Images and Videos

Graph Chatbot

Chat with Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Aggregating Spatial and Photometric Context for Photometric Stereo

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

Driving and suppressing the human language network using large language models

Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation

Explainable Fault Diagnosis of Oil-Immersed Transformers: A Glass-Box Model

Incorporating Projective Geometry into Deep Learning

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Aggregating Spatial and Photometric Context for Photometric Stereo

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

Driving and suppressing the human language network using large language models

Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation

Incorporating Projective Geometry into Deep Learning

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Explainable Fault Diagnosis of Oil-Immersed Transformers: A Glass-Box Model