Low-Dimensional Motion Features for Audio-Visual Speech Recognition
Related publications (47)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The role of audio–visual speech synchrony for speaker diarisation is investigated on the multiparty meeting domain. We measured both mutual information and canonical correlation on different sets of audio and video features. As acoustic features we conside ...
This report presents a semi-supervised method to jointly extract audio-visual sources from a scene. It consist of applying a supervised method to segment the video signal followed by an automatic process to properly separate the audio track. This approach ...
Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...
When and where is visual motion processed in the human brain? This question is highly relevant considering the importance of motion for our perception of the dynamical world surrounding us. In the present work we studied motion processing, firstly through ...
The aim of this thesis is to build a system able to automatically and robustly track human motion in 3–D starting from monocular input. To this end two approaches are introduced, which tackle two different types of motion: The first is useful to analyze ac ...
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of async ...
We present a probabilistic approach to learn robust models of human motion through imitation. The association of Hidden Markov Model (HMM), Gaussian Mixture Regression (GMR) and dynamical systems allows us to extract redundancies across multiple demonstrat ...
With the increase in cheap commercially available sensors, recording meetings is becoming an increasingly practical option. With this trend comes the need to summarize the recorded data in semantically meaningful ways. Here, we investigate the task of auto ...
In this paper, we propose a new Distributed Video Coding (DVC) architecture where motion estimation is performed both at the encoder and decoder, effectively combining global and local motion models. We show that the proposed approach improves significantl ...
We propose a novel non-linear video diffusion approach which is able to focus on parts of a video sequence that are relevant for applications in audio-visual analysis. The diffusion process is controlled by a diffusion coefficient based on an estimate of th ...