Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array
Publications associées (49)
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
Speaker diarization is originally defined as the task of de- termining “who spoke when” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state- of-the-art speaker diarizati ...
The generation of subjectively diffuse sound fields is an essential part of creating pleasing synthetic sound fields using loudspeaker playback. A number of studies have been published presenting subjective evaluations of the diffuse sound field reproducti ...
Acoustic echo control and noise suppression is an important part of any "handsfree" telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-c ...
With the increase in cheap commercially available sensors, recording meetings is becoming an increasingly practical option. With this trend comes the need to summarize the recorded data in semantically meaningful ways. Here, we investigate the task of auto ...
This paper proposes a joint verification-localization structure based on split-band analysis of speech signal and the mixed voicing level. To address the problems in reverberant acoustic environments, a new fundamental frequency estimation algorithm is pro ...
Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...
In this paper, we address a beamforming application based on the capture of far-field speech data from a single speaker in a real meeting room. After the position of the speaker is estimated by a speaker tracking system, we construct a subband-domain beamf ...
The tetrahedral microphone capsule arrangement in a Soundfield microphone captures a so-called A-format signal which is then converted to a corresponding B-format signal. The phase differences between the A-format signal channels due to non-coincidence of ...
Acoustic echo control and noise suppression is an important part of any "handsfree" telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-c ...
This paper presents our approach for automatic speech recognition (ASR) of overlapping speech. Our system consists of two principal components: a speech separation component and a feature estmation component. In the speech separation phase, we first estima ...