Integrating audio and vision for robust automatic gender recognition
Related publications (36)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual feature t ...
This report presents a new method to confront the Blind Audio Source Separation (BASS) problem, by means of audio and visual information. In a given mixture, we are able to locate the video sources first and, posteriorly, recover each source signal, only w ...
For parametric stereo and multi-channel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multi-channel audio signals. In practi ...
A wide range of techniques for coding a single speech or audio signal channel have been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bitrate. O ...
The IDIAP Smart Meeting Room is a meeting room equipped with synchronised, multi-channel audio-visual recording facilities. This document presents a detailed description of the room with particular emphasis on the acquisition equipment and the components u ...
Audio-visual speech recognition promises to improve the performance of speech recognizers, especially when the audio is corrupted, by adding information from the visual modality, more specifically, from the video of the speaker. However, the number of visu ...