AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
For parametric stereo and multi-channel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multi-channel audio signals. In practi ...
Wave field synthesis (WFS) is a prevalent approach to multiple-loudspeaker sound reproduction for an extended listening area. Although powerful as a theoretical concept, its deployment is hampered by practical limitations due to diffraction, aliasing, and ...
Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide ...
Abstract Based on binaural signals, i.e. the signals observed at the two ears, a listener can localize and recognize different sound sources and then focus on one of these. For decades, researchers have tried to invent a machine that can do the same under ...
This paper proposes a technique that segments into speaker turns based on their location, essentially implementing a discrete source tracking system. In many multi-party conversations, such as meetings or teleconferences, the location of participants is re ...
Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...
The recent advance of computer technology pushed the computing power of the modern computer systems to unprecedented levels; modern processors moved from the conventional, scalar type of architecture to more sophisticated, parallel ones, combining fast pro ...
A wide range of techniques for coding a single speech or audio signal channel have been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bitrate. O ...
A perceptually motivated spatial decomposition for two-channel stereo audio signals, capturing the information about the virtual sound stage, is proposed. The spatial decomposition allows to re-synthesize audio signals for playback over other sound systems ...
We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for ...