Mutual information eigenlips for audio-visual speech recognition

This paper proposes an application of information theoretic approach for finding the most informative subset of eigenfeatures to be used for audio-visual speech recognition tasks. The state-of-the-art visual feature extraction methods in the area of speechreading rely on either pixel or geometric based methods or their combination. However, there is no common rule defining how these features have to be selected with respect to the chosen set of audio cues and how well they represent the classes of the uttered speech. Our main objective is to exploit the complementarity of audio and visual sources and select meaningful visual descriptors by the means of mutual information. We focus on the principal components projections of the mouth region images and apply the proposed method such that only those cues having the highest mutual information with word classes are retained. The algorithm is tested by performing various speech recognition experiments on a chosen audio-visual dataset. The obtained recognition rates are compared to those acquired using a conventional principal component analysis and promising results are shown.

Mutual information eigenlips for audio-visual speech recognition

Graph Chatbot

Chattez avec Graph Search

Sparse Autoencoders for Speech Modeling and Recognition

Sparse and Low-rank Modeling for Automatic Speech Recognition

Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Sparse and Low-rank Modeling for Automatic Speech Recognition

Sparse Autoencoders for Speech Modeling and Recognition

Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling