Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array
Related publications (49)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
This paper investigates the use of microphone arrays to acquire and recognise speech in meetings. Meetings pose several interesting problems for speech processing, as they consist of multiple competing speakers within a small space, typically around a tabl ...
Speaker turn detection is an important task for many speech processing applications. However, accurate segmentation can be hard to achieve if there are multiple concurrent speakers (overlap), as is typically the case in multi-party conversations. In such c ...
This paper presents overview of an online audio indexing system, which creates a searchable index of speech content embedded in digitized audio files. This system is based on our recently proposed offline audio segmentation techniques. As the data arrives ...
This report describes the processing algorithms and gives an overview of the hardware for the small microphone array unit in the IM2.RTMAP (Real-time Microphone Array Processing) project. The algorithms include techniques for speech enhancement, speaker lo ...
We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for ...
Speaker turn detection is an important task for many speech processing applications. However, accurate segmentation can be hard to achieve if there are multiple concurrent speakers (overlap), as is typically the case in multi-party conversations. In such c ...
We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for ...
The challenge of automatic speech recognition (ASR) increases when speaker variability is encountered. Being able to automatically use different acoustic models according to speaker type might help to increase the robustness of ASR. We present a system tha ...
The Multi-Stream automatic speech recognition approach was investigated in this work as a framework for Audio-Visual data fusion and speech recognition. This method presents many potential advantages for such a task. It particularly allows for synchronous ...