Multi-pose lipreading and audio-visual speech recognition
Related publications (71)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The Multi-Stream automatic speech recognition approach was investigated in this work as a framework for Audio-Visual data fusion and speech recognition. This method presents many potential advantages for such a task. It particularly allows for synchronous ...
This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching ...
This paper presents a novel method to correlate audio and visual data generated by the same physical phenomenon, based on sparse geometric representation of video sequences. The video signal is modeled as a sum of geometric primitives evolving through time ...
The purpose of this paper is to unify several of the state-of-the-art score normalization techniques applied to text-independent speaker verification systems. We propose a new framework for this purpose. The two well-known Z- and T-normalization techniques ...
The Multi-Stream automatic speech recognition approach was investigated in this work as a framework for Audio-Visual data fusion and speech recognition. This method presents many potential advantages for such a task. It particularly allows for synchronous ...
The paper describes an animation approach where synthetic vision is used for navigation by a digital actor. The vision is the only channel of information between the actor and its environment and offers a universal approach to pass the necessary informatio ...
The purpose of this paper is to unify several of the state-of-the-art score normalization techniques applied to text-independent speaker verification systems. We propose a new framework for this purpose. The two well-known Z- and T-normalization techniques ...
We show that complex visual tasks, such as position- and size-invariant shape recognition and navigation in the environment, can be tackled with simple architectures generated by a coevolutionary process of active vision and feature selection. Behavioral m ...
Complex visual hallucinations are a well-known feature of electrical stimulation or epileptic discharge in the temporal lobe. It has been proposed that these visual hallucinations result from an electrical interference with the ventral visual processing st ...
The authors describe two different implementations of time-dependent d. functional theory (TDDFT) for use in excited state mol. dynamics simulations. One is based on the linear response formulation (LR-TDDFT), whereas the other uses a time propagation sche ...