Publication

A multimodal approach to extract optimized audio features for speaker detection

Related publications (46)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Audiovisual Diarization Of People In Video Content

Audio-Visual People Diarization (AVPD) is an original framework that simultaneously improves audio, video, and audiovisual diarization results. Following a literature review of people diarization for both audio and video content and their limitations, whic ...

2014

Audio Novelty-Based Segmentation of Music Concerts

Hervé Lissek, Patrick Marmaroli, Dalia Salem Hassan Fahmy El Badawy

The Swiss Federal Institute of Technology in Lausanne (EPFL) is in the process of digitizing an exceptional collection of audio and video recordings of the Montreux Jazz Festival (MJF) concerts. Since 1967, five thousand hours of both audio and video have ...

2013

Audio-Visual Object Extraction using Graph Cuts

Pierre Vandergheynst, Anna Llagostera Casanovas

We propose a novel method to automatically extract the audio-visual objects that are present in a scene. First, the synchrony between related events in audio and video channels is exploited to identify the possible locations of the sound sources. Video reg ...

Institute of Electrical and Electronics Engineers2012

Audio-Visual Fusion

Anna Llagostera Casanovas

The perception that we have about the world is influenced by elements of diverse nature. Indeed humans tend to integrate information coming from different sensory modalities to better understand their environment. Following this observation, scientists hav ...

EPFL2011

Video Quality for Face Detection, Recognition and Tracking

Many distributed multimedia applications rely on video analysis algorithms for automated video and image processing. Little is known, however, about the minimum video quality required to ensure an accurate performance of these algorithms. In an attempt to ...

2011

Cooperative video streaming on smartphones

Christina Fragouli, Lorenzo Keller

Video applications are increasingly popular over smartphones. However, in current cellular systems, the downlink data rate fluctuates and the loss rate can be quite high. We are interested in the scenario where a group of smartphone users, within proximity ...

2011

Semi-supervised Extraction of Audio-Visual Sources

Patricia Calatayud Martinez

This report presents a semi-supervised method to jointly extract audio-visual sources from a scene. It consist of applying a supervised method to segment the video signal followed by an automatic process to properly separate the audio track. This approach ...

2010

An Information Theoretic Approach to Speaker Diarization of Meeting Recordings

Deepu Vijayasenan

In this thesis we investigate a non parametric approach to speaker diarization for meeting recordings based on an information theoretic framework. The problem is formulated using the Information Bottleneck (IB) principle. Unlike other approaches where the ...

EPFL2010

Anthropic Correction of Information Estimates and Its Application to Neural Coding

Michael Christoph Gastpar

Information theory has been used as an organizing principle in neuroscience for several decades. Estimates of the mutual information (MI) between signals acquired in neurophysiological experiments are believed to yield insights into the structure of the un ...

2010

Audio?Visual Synchronisation for Speaker Diarisation

Hervé Bourlard

The role of audio–visual speech synchrony for speaker diarisation is investigated on the multiparty meeting domain. We measured both mutual information and canonical correlation on different sets of audio and video features. As acoustic features we conside ...

2010