Publication

Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System

Publications associées (153)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Unsupervised Speech/Non-speech Detection for Automatic Speech Recognition in Meeting Rooms

Daniel Gatica-Perez, Petr Motlicek

The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant ...

2007

Robust overlapping speech recognition based on neural networks

John David Scott Dines, Weifeng Li

We address issues for improving hands-free speech recognition performance in the presence of multiple simultaneous speakers using multiple distant microphones. In this paper, a log spectral mapping is proposed to estimate the log mel-filterbank outputs of ...

IDIAP2007

Correcting Confusion Matrices for Phone Recognizers

Modern speech recognition has many ways of quantifying the misrecognitions a speech recognizer makes. The errors in modern speech recognition makes extensive use of the Levenshtein algorithm to find the distance between the labeled target and the recognize ...

IDIAP2007

Unsupervised Speech/Non-speech Detection for Automatic Speech Recognition in Meeting Rooms

Daniel Gatica-Perez, Petr Motlicek

IDIAP2006

Further Applications of Sector-Based Detection and Short-Term Clustering

Guillaume Lathoud

This paper presents an effective implementation of detection-localization of multiple speech sources with microphone arrays. In particular, the Scaled Conjugate Gradient descent is used for fast and precise localization, within a pre-detected volume of spa ...

IDIAP2006

Mutual information eigenlips for audio-visual speech recognition

Jean-Philippe Thiran, Ivana Arsic de Heras Ciechomska

This paper proposes an application of information theoretic approach for finding the most informative subset of eigenfeatures to be used for audio-visual speech recognition tasks. The state-of-the-art visual feature extraction methods in the area of speech ...

IEEE2006

Automatic genre classification of music content

Giorgio Zoia, Nicolas Scaringella

This paper reviews the state-of-the-art in automatic genre classification of music collections through three main paradigms: expert systems, unsupervised classification, and supervised classification. The paper discusses the importance of music genres with ...

2006

Audio-Visual Speech Recognition with a Hybrid SVM-HMM System

Jean-Philippe Thiran, Mihai Gurban

Traditional speech recognition systems use Gaussian mixture models to obtain the likelihoods of individual phonemes, which are then used as state emission probabilities in hidden Markov models representing the words. In hybrid systems, the Gaussian mixture ...

EUSIPCO2005

Using pitch frequency information in speech recognition

Hervé Bourlard

Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with pitch frequency could improve the system performance of the system. ...

IDIAP2003

Using pitch frequency information in speech recognition

Hervé Bourlard

2003