Publication

Overcoming Asynchrony in Audio-Visual Speech Recognition

Publications associées (29)

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Atypical aspects in speech concern speech that deviates from what is commonly considered normal or healthy. In this thesis, we propose novel methods for detection and analysis of these aspects, e.g. to monitor the temporary state of a speaker, diseases tha ...

EPFL2023

On matching data and model in LF-MMI-based dysarthric speech recognition

Enno Hermann

In light of steady progress in machine learning, automatic speech recognition (ASR) is entering more and more areas of our daily life, but people with dysarthria and other speech pathologies are left behind. Their voices are underrepresented in the trainin ...

EPFL2023

Explanation of Face Recognition via Saliency Maps

Touradj Ebrahimi, Yuhang Lu

Despite the significant progress in recent years, deep face recognition is often treated as a "black box" and has been criticized for lacking explainability. It becomes increasingly important to understand the characteristics and decisions of deep face rec ...

2023

How to Boost Face Recognition with StyleGAN?

Nikita Durasov

State-of-the-art face recognition systems require vast amounts of labeled training data. Given the priority of privacy in face recognition applications, the data is limited to celebrity web crawls, which have issues such as limited numbers of identities. O ...

Ieee Computer Soc2023

Recognizing distant faces

Lukas Vogelsang, Marin Vogelsang

As an 'early alerting' sense, one of the primary tasks for the human visual system is to recognize distant objects. In the specific context of facial identification, this ecologically important task has received surprisingly little attention. Most studies ...

PERGAMON-ELSEVIER SCIENCE LTD2023

Gemini: Elastic SNARKs for Diverse Environments

Alessandro Chiesa

We introduce a new class of succinct arguments, that we call elastic. Elastic SNARKs allow the prover to allocate different resources (such as memory and time) depending on the execution environment and the statement to prove. The resulting output is indep ...

SPRINGER INTERNATIONAL PUBLISHING AG2022

Dynamic Personalized Ranking

Jérémie Rappaz

Personalized ranking methods are at the core of many systems that learn to produce recommendations from user feedbacks. Their primary objective is to identify relevant items from very large vocabularies and to assist users in discovering new content. These ...

EPFL2022

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Mathew Magimai Doss, Eklavya Sarkar

Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source ...

ISCA2022

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Julian David Fritsch

EPFL2023

Sparse Autoencoders for Speech Modeling and Recognition

Selen Hande Kabil

Speech recognition-based applications upon the advancements in artificial intelligence play an essential role to transform most aspects of modern life. However, speech recognition in real-life conditions (e.g., in the presence of overlapping speech, varyin ...

EPFL2023

On matching data and model in LF-MMI-based dysarthric speech recognition

Enno Hermann

EPFL2023

Explanation of Face Recognition via Saliency Maps

Touradj Ebrahimi, Yuhang Lu

2023

A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers

Petr Motlicek, Juan Pablo Zuluaga Gomez, Amrutha Prasad

In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken ...

MDPI2023