Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
For parametric stereo and multi-channel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multi-channel audio signals. In practi ...
A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual feature t ...
In this paper, we describe additional experiments based on a novel audio coding technique that uses an autoregressive model to approximate an audio signal's Hilbert envelope. This technique is performed over long segments (1000 ms) in critical-band-sized s ...
This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define ...
We address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement throu ...
Given two video sequences (IS1, IS2), a composite video sequence (15) can be generated which includes visual elements (A, B, 21) from each of the given sequences, suitably synchronized and represented in a chosen focal plane. For example, given two video s ...
In most watermarking systems, masking models, inherited from data compression algorithms, are used to preserve fidelity by controlling the perceived distortion resulting from adding the watermark to the original signal. So far, little attention has been pa ...
We present a method that exploits an information theoretic framework to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the a ...
This paper presents a new approach toward automatic annotation of meetings in terms of speaker identities and their locations. This is achieved by segmenting the audio recordings using two independent sources of information : magnitude spectrum analysis an ...
A wide range of techniques for coding a single speech or audio signal channel have been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bitrate. O ...