Developing and Enhancing Posterior Based Speech Recognition Systems
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In the classical quickest detection problem, one must detect as quickly as possible when a Brownian motion without drift "changes" into a Brownian motion with positive drift. The change occurs at an unknown "disorder" time with exponential distribution. Th ...
Automatic speech recognition (ASR) is a fascinating area of research towards realizing humanmachine interactions. After more than 30 years of exploitation of Gaussian Mixture Models (GMMs), state-of-the-art systems currently rely on Deep Neural Network (DN ...
Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown ...
We propose a tractable equilibrium model for pricing defaultable bonds that are subject to contagion risk. Contagion arises because agents with 'fragile beliefs' are uncertain about both the underlying state of the economy and the posterior probabilities a ...
In i-vector based speaker recognition systems, back-end classifiers are trained to factor out nuisance information and retain only the speaker identity. As a result, variabilities arising due to gender, language and accent ( among many others) are suppress ...
In recent works, the use of phone class-conditional posterior probabilities (posterior features) directly as features provided successful results in template-based ASR systems. Moreover, it has been shown that these features tend to be sparse and orthogona ...
In i-vector based speaker recognition systems, back-end classifiers are trained to factor out nuisance information and retain only the speaker identity. As a result, variabilities arising due to gender, language and accent ( among many others) are suppress ...
We study the fundamental problem of learning an unknown, smooth probability function via pointwise Bernoulli tests. We provide a scalable algorithm for efficiently solving this problem with rigorous guarantees. In particular, we prove the convergence rate ...
We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of low-dimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstruct ...
In the context of hybrid HMM/MLP Automatic Speech Recognition (ASR), this paper describes an investigation into a new type of stochastic phone space transformation, which maps "source" phone (or phone HMM state) posterior probabilities (as obtained at the ...