Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The generation of subjectively diffuse sound fields is an essential part of creating pleasing synthetic sound fields using loudspeaker playback. A number of studies have been published presenting subjective evaluations of the diffuse sound field reproducti ...
Acoustic echo control and noise suppression is an important part of any "handsfree" telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-c ...
This paper proposes a joint verification-localization structure based on split-band analysis of speech signal and the mixed voicing level. To address the problems in reverberant acoustic environments, a new fundamental frequency estimation algorithm is pro ...
With the increase in cheap commercially available sensors, recording meetings is becoming an increasingly practical option. With this trend comes the need to summarize the recorded data in semantically meaningful ways. Here, we investigate the task of auto ...
Speaker diarization is originally defined as the task of de- termining “who spoke when” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state- of-the-art speaker diarizati ...
In this paper, we address a beamforming application based on the capture of far-field speech data from a single speaker in a real meeting room. After the position of the speaker is estimated by a speaker tracking system, we construct a subband-domain beamf ...
Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...
The tetrahedral microphone capsule arrangement in a Soundfield microphone captures a so-called A-format signal which is then converted to a corresponding B-format signal. The phase differences between the A-format signal channels due to non-coincidence of ...
This paper presents our approach for automatic speech recognition (ASR) of overlapping speech. Our system consists of two principal components: a speech separation component and a feature estmation component. In the speech separation phase, we first estima ...
Acoustic echo control and noise suppression is an important part of any "handsfree" telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-c ...