Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each pixel to one of the gaussian layer. The assignment is bas ...
In this report, we build up on our previous work on speaker clustering, where the number of speakers and segmentation boundaries are unknown a priori. We employ an ergodic HMM with minimum duration topology for this purpose. Starting from a large number of ...
In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain situations or in modeling it independently of the hidden state in ...
The purpose of this paper is to investigate the behavior of HMM2 models for the recognition of noisy speech. It has previously been shown that HMM2 is able to model dynamically important structural information inherent in the speech signal, often correspon ...
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition pe ...
In this paper, we present an HMM2 based method for speaker normalization. Introduced as an extension of Hidden Markov Model (HMM), HMM2 differentiates itself from the regular HMM in terms of the emission density modeling, which is done by a set of state-de ...
In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each pixel to one of the gaussian layer. The assignment is bas ...
Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with auxiliary information could improve the performance of the system. ...
Recently we have proposed an approach for user-customized password speaker verification; in this approach, we combined a hybrid HMM/ANN model (used for utterance verification) and a GMM model (used for speaker verification). In this paper, we extend our in ...
In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain situations or in modeling it independently of the hidden state in ...