Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that ...
With ever greater computational resources and more accessible software, deep neural networks have become ubiquitous across industry and academia.
Their remarkable ability to generalize to new samples defies the conventional view, which holds that complex, ...
In this paper, we propose a novel temporal spiking recurrent neural network (TSRNN) to perform robust action recognition in videos. The proposed TSRNN employs a novel spiking architecture which utilizes the local discriminative features from high-confidenc ...
The performance of speaker recognition systems has considerably improved in the last decade. This is mainly due to the development of Gaussian mixture model-based systems and in particular to the use of i-vectors. These systems handle relatively well noise ...
In hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial st ...
Towards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are ...
In this paper, we explore various approaches for semi-
supervised learning in an end-to-end automatic speech recog-
nition (ASR) framework. The first step in our approach in-
volves training a seed model on the limited amount of labelled
data. Additional u ...
This paper addresses the problem of automatic facial expression recognition in videos, where the goal is to predict discrete emotion labels best describing the emotions expressed in short video clips. Building on a pre-trained convolutional neural network ...
Feature extraction is a key step in many machine learning and signal processing applications. For speech signals in particular, it is important to derive features that contain both the vocal characteristics of the speaker and the content of the speech. In ...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that can work with very little adaptation data. It is also well known that VTLN can be cast as a linear transform in the cepstral domain. Building on this latter ...