Multi-pose lipreading and audio-visual speech recognition

In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization block in a standard system and generate virtual frontal views from non-frontal images. The proposed method is inspired by pose-invariant face recognition and relies on linear regression to find an approximate mapping between images from different poses. We integrate the proposed pose normalization block at different stages of the speech recognition system and quantify the loss of performance related to pose changes and pose normalization techniques. In audio-visual experiments we also analyze the integration of the audio and visual streams. We show that an audio-visual system should account for non-frontal poses and normalization techniques in terms of the weight assigned to the visual stream in the classifier.

Multi-pose lipreading and audio-visual speech recognition

Graph Chatbot

Chattez avec Graph Search

Predicting Visual Stimuli From Cortical Response Recorded With Wide-Field Imaging in a Mouse

Intraday solar irradiance forecasting using public cameras

Probing and modulating inter-areal coupling in the cortical visual motion processing pathway with non-invasive brain stimulation

Predicting Visual Stimuli From Cortical Response Recorded With Wide-Field Imaging in a Mouse

Intraday solar irradiance forecasting using public cameras

Probing and modulating inter-areal coupling in the cortical visual motion processing pathway with non-invasive brain stimulation