Two-level bimodal association for audio-visual speech recognition
Related publications (45)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Speaker detection is an important component of a speech-based user interface. Audiovisual speaker detection, speech and speaker recognition or speech synthesis for example find multiple applications in human-computer interaction, multimedia content indexin ...
State-of-the-art automatic speech recognition (ASR) techniques are typically based on hidden Markov models (HMMs) for the modeling of temporal sequences of feature vectors extracted from the speech signal. At the level of each HMM state, Gaussian mixture m ...
The goal of this thesis is to develop and design new feature representations that can improve the automatic speech recognition (ASR) performance in clean as well noisy conditions. One of the main shortcomings of the fixed scale (typically 20-30 ms long ana ...
This paper proposes a simple, computationally efficient 2-mixture model approach to discriminate between speech and background noise at the magnitude spectrogram level. It is directly derived from observations on real data, and can be used in a fully unsup ...
This paper proposes a Distant Speech Recognition system based on a novel speaker Localization and Beamforming (SRLB) algorithm. To localize the speaker an algorithm based on Steered Response Power by utilizing harmonic structures of speech signal is propos ...
This paper proposes a simple, computationally efficient \mbox{2-mixture} model approach to discriminate between speech and background noise at the magnitude spectrogram level. It is directly derived from observations on real data, and can be used in a full ...
This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algor ...
This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algor ...
A road traffic noise prediction model (ASJ MODEL-1998) has been integrated with a road traffic simulator (AVENUE) to produce the Dynamic areawide Road traffic NoisE simulator-DRONE. This traffic-noise-GIS based integrated tool is upgraded to predict noise ...
We propose a novel configuration for a Brillouin distributed sensor based on Brillouin optical time domain analysis. This new configuration eliminates many intensity noise issues found in previous schemes. Resolution of 3.5 m all over a 47km single-mode fi ...
SPIE, Bellingham WA, WA 98227-0010, United States2007