Hierarchical Integration of Phonetic and Lexical Knowledge in Phone Posterior Estimation
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Standard automatic speech recognition (ASR) systems follow a divide and conquer approach to convert speech into text. Alternately, the end goal is achieved by a combination of sub-tasks, namely, feature extraction, acoustic modeling and sequence decoding, ...
State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ...
Second-order pooling, a.k.a. bilinear pooling, has proven effective for deep learning based visual recognition. However, the resulting second-order networks yield a final representation that is orders of magnitude larger than that of standard, first-order ...
2018
, ,
Automatic Speech Recognition (ASR), as the assistance of speech communication between pilots and air-traffic controllers, can significantly reduce the complexity of the task and increase the reliability of transmitted information. ASR application can lead ...
This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that ...
State-of-the-art automatic speech recognition (ASR) and text-to-speech systems require a pronunciation lexicon that maps each word to a sequence of phones. Manual development of lexicons is costly as it needs linguistic knowledge and human expertise. To fa ...
This paper addresses the problem of automatic facial expression recognition in videos, where the goal is to predict discrete emotion labels best describing the emotions expressed in short video clips. Building on a pre-trained convolutional neural network ...
In this paper, we propose a novel temporal spiking recurrent neural network (TSRNN) to perform robust action recognition in videos. The proposed TSRNN employs a novel spiking architecture which utilizes the local discriminative features from high-confidenc ...
2019
, ,
Automatic visual speech recognition is an interesting problem in pattern recognition especially when audio data is noisy or not readily available. It is also a very challenging task mainly because of the lower amount of information in the visual articulati ...
Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improv ...