Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Humans are able to learn and compose complex, yet beautiful, pieces of music as seen in e.g. the highly complicated works of J.S. Bach. However, how our brain is able to store and produce these very long temporal sequences is still an open question. Long s ...
Speaker diarization is the task of identifying ``who spoke when'' in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization sys ...
Any biometric recognizer is vulnerable to spoofing attacks and hence voice biometric, also called automatic speaker verification (ASV), is no exception; replay, synthesis, and conversion attacks all provoke false acceptances unless countermeasures are used ...
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it ...
This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significa ...
Speaker diarization is the task of identifying “who spoke when” in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization syste ...
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it ...
Statistical speech recognition has been cast as a natural realization of the compressive sensing and sparse recovery. The compressed acoustic observations are sub-word posterior probabilities obtained from a deep neural network (DNN). Dictionary learning a ...
This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significa ...
Smart sensors, such as smart meters or smart phones, are nowadays ubiquitous. To be "smart", however, they need to process their input data with limited storage and computational resources. In this paper, we convert the stream of sensor data into a stream ...