MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
We revisit the problem of blocking artifacts and their suppression in generic frame-based speech/audio applications. We provide a perceptual characterization of the artifacts by using dynamic auditory models. We propose some short-time-Fourier-transform-ba ...
Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...
Acoustic echo control and noise suppression is an important part of any "handsfree" telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-c ...
This report presents a semi-supervised method to jointly extract audio-visual sources from a scene. It consist of applying a supervised method to segment the video signal followed by an automatic process to properly separate the audio track. This approach ...
Acoustic echo control and noise suppression is an important part of any "handsfree" telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-c ...
Audio-visual speech recognition promises to improve the performance of speech recognizers, especially when the audio is corrupted, by adding information from the visual modality, more specifically, from the video of the speaker. However, the number of visu ...
Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual cues. In this paper we propose the combination of an audio diarisation system with psychology inspired visual features, reporting experiments on multiparty ...
This book constitutes the thoroughly refereed post-proceedings of the 4th International Workshop on Machine Learning for Multimodal Interaction, MLMI 2007, held in Brno, Czech Republic, in June 2007. The 25 revised full papers presented together with 1 inv ...
This paper proposes an analysis technique for wide-band audio applications based on the predictability of the temporal evolution of Quadrature Mirror Filter (QMF) sub-band signals. The input audio signal is first decomposed into 64 sub-band signals using Q ...
This paper proposes a technique for wide-band audio applications based on the predictability of the temporal evolution of Quadrature Mirror Filter (QMF) sub-band signals. An input audio signal is first decomposed into 64 frequency sub-band signals using QM ...