Publication

Speech recognition with auxiliary information

2003
EPFL thesis
Abstract

Automatic speech recognition (ASR) is a very challenging problem due to the wide variety of the data that it must be able to deal with. Being the standard tool for ASR, hidden Markov models (HMMs) have proven to work well for ASR when there are controls over the variety of the data. Being relatively new to ASR, dynamic Bayesian networks (DBNs) are more generic models with algorithms that are more flexible than those of HMMs. Various assumptions can be changed without modifying the underlying algorithm and code, unlike in HMMs; these assumptions relate to the variables to be modeled, the statistical dependencies between these variables, and the observations which are available for certain of the variables. The main objective of this thesis, therefore, is to examine some areas where DBNs can be used to change HMMs' assumptions so as to have models that are more robust to the variety of data that ASR must deal with. HMMs model the standard observed features by jointly modeling them with a hidden discrete state variable and by having certain restraints placed upon the states and features. Some of the areas where DBNs can generalize this modeling framework of HMMs involve the incorporation of even more "auxiliary" variables to help the modeling which HMMs typically can only do with the two variables under certain restraints. The DBN framework is more flexible in how this auxiliary variable is introduced in different ways. First, this auxiliary information aids the modeling due to its correlation with the standard features. As such, in the DBN framework, we can make it directly condition the distribution of the standard features. Second, some types of auxiliary information are not strongly correlated with the hidden state. So, in the DBN framework we may want to consider the auxiliary variable to be conditionally independent of the hidden state variable. Third, as auxiliary information tends to be strongly correlated with its previous values in time, I show DBNs using discretized auxiliary variables that model the evolution of the auxiliary information over time. Finally, as auxiliary information can be missing or noisy in using a trained system, the DBNs can do recognition using just its prior distribution, learned on auxiliary information observations during training. I investigate these different advantages of DBN-based ASR using auxiliary information involving articulator positions, estimated pitch, estimated rate-of-speech, and energy. I also show DBNs to be better at incorporating auxiliary information than hybrid HMM/ANN ASR, using artificial neural networks (ANNs). I show how auxiliary information is best introduced in a time-dependent manner. Finally, DBNs with auxiliary information are better able than standard HMM approaches to handling noisy speech; specifically, DBNs with hidden energy as auxiliary information -- that conditions the distribution of the standard features and which is conditionally independent of the state -- are more robust to noisy speech than HMMs are.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it — with unobservable ("hidden") states. As part of the definition, HMM requires that there be an observable process whose outcomes are "influenced" by the outcomes of in a known way.
Speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
Markov model
In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property.
Show more
Related publications (59)

Hidden Markov Modeling Over Graphs

Ali H. Sayed, Mert Kayaalp, Stefan Vlaski, Virginia Bordignon

This work proposes a multi-agent filtering algorithm over graphs for finite-state hidden Markov models (HMMs), which can be used for sequential state estimation or for tracking opinion formation over dynamic social networks. We show that the difference fro ...
IEEE2022

On quantifying the quality of acoustic models in hybrid DNN-HMM ASR

Hervé Bourlard, Afsaneh Asaei, Pranay Dighe

We propose an information theoretic framework for quantitative assessment of acoustic models used in hidden Markov model (HMM) based automatic speech recognition (ASR). The HMM backend expects that (i) the acoustic model yields accurate state conditional e ...
ELSEVIER2020

On Vacuous and Non-Vacuous Generalization Bounds for Deep Neural Networks.

Konstantinos Pitas

Deep neural networks have been empirically successful in a variety of tasks, however their theoretical understanding is still poor. In particular, modern deep neural networks have many more parameters than training data. Thus, in principle they should over ...
EPFL2020
Show more
Related MOOCs (2)
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.