Blind Audiovisual Source Separation Using Sparse Redundant Representations
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In this paper we consider the problem of automatic extraction of the geometric lip features for the purposes of multi-modal speaker identification. The use of visual information from the mouth region can be of great importance for improving the speaker ide ...
This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching ...
Quality assessment is a central issue in the design, implementation, and performance testing of all systems. Digital signal processing systems generally deal with visual information that are meant for human consumption. An image, a video, or a 3D model may ...
Accessing, organizing, and manipulating home videos present technical challenges due to their unrestricted content and lack of storyline. In this paper, we present a methodology to discover cluster structure in home videos, which uses video shots as the un ...
In the world of video especially in video processing important steps are being taken at this time. Research projects around the world are tackling all kind of tracking problems and segmentation of video images. Others are doing feature extraction, image en ...
Given two video sequences, a composite video sequence can be generated which includes visual elements from each of the given sequences, suitably synchronized and represented in a chosen focal plane. For example, given two video sequences with each showing ...
We present a method that exploits an information theoretic framework to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the a ...
Given two video sequences (IS1, IS2), a composite video sequence (15) can be generated which includes visual elements (A, B, 21) from each of the given sequences, suitably synchronized and represented in a chosen focal plane. For example, given two video s ...
This paper presents a novel method to correlate audio and visual data generated by the same physical phenomenon, based on sparse geometric representation of video sequences. The video signal is modeled as a sum of geometric primitives evolving through time ...
Visual information, in the form of images and video, comes from the interaction of light with objects. Illumination is a fundamental element of visual information. Detecting and interpreting illumination effects is part of our everyday life visual experien ...