Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Using phone posterior probabilities has been increasingly explored for improving automatic speech recognition (ASR) systems. In this paper, we propose two approaches for hierarchically enhancing these phone posteriors, by integrating long acoustic context, ...
Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronou ...
We describe a kernel wrapper, a Mercer kernel for the task of phoneme sequence recognition which is based on operations with the Gaussian kernel, and suitable for any sequence kernel classifier. We start by presenting a kernel-based algorithm for phoneme s ...
This paper proposes a novel grapheme-to-phoneme (G2P) conversion approach where first the probabilistic relation between graphemes and phonemes is captured from acoustic data using Kullback-Leibler divergence based hidden Markov model (KL-HMM) system. Then ...
Speech sounds can be characterized by articulatory features. Articulatory features are typically estimated using a set of multilayer perceptrons (MLPs), i.e., a separate MLP is trained for each articulatory feature. In this paper, we investigate multitask ...
Speech sounds can be characterized by articulatory features. Articulatory features are typically estimated using a set of multilayer perceptrons (MLPs), i.e., a separate MLP is trained for each articulatory feature. In this report, we investigate multitask ...
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of async ...
This chapter introduces a discriminative method for detecting and spotting keywords in spoken utterances. Given a word represented as a sequence of phonemes and a spoken utterance, the keyword spotter predicts the best time span of the phoneme sequence in ...
In this paper, we propose a simple approach to jointly model both grapheme and phoneme information using Kullback-Leibler divergence based HMM (KL-HMM) system. More specifically, graphemes are used as subword units and phoneme posterior probabilities estim ...
The state-of-the-art automatic speech recognition (ASR) systems typically use phonemes as subword units. In this work, we present a novel grapheme-based ASR system that jointly models phoneme and grapheme information using Kullback-Leibler divergence-based ...