Concept

Mastering

Publications associées (30)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Multimodal Speaker Localization from Omnidirectional Videos

Jean-Philippe Thiran, Mihai Gurban, Pascal Reuse

The use of omnidirectional cameras for videoconferencing promises to simplify the hardware setup necessary for large groups of participants. We investigate the use of a multimodal speaker detection algorithm on audio-visual sequences captured with such a c ...

2009

Learning bimodal structure in audio-visual data

Pierre Vandergheynst, Gianluca Monaci

A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio- visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform an ...

2009

Multimodal feature extraction and fusion for audio-visual speech recognition

Mihai Gurban

Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...

EPFL2009

Associating Audio-Visual Activity Cues in a Dominance Estimation Framework

Daniel Gatica-Perez, Yan Huang

We address the problem of both estimating the dominant person in a meeting from a single audio source and identifying them visually in a multi-camera setting. We use a speaker diarization algorithm to perform speaker segmentation and clustering, representi ...

Idiap2008

Associating Audio-Visual Activity Cues in a Dominance Estimation Framework

Daniel Gatica-Perez, Yan Huang

2008

Clustering And Segmenting Speakers And Their Locations In Meetings

Guillaume Lathoud, Jitendra Ajmera

This paper presents a new approach toward automatic annotation of meetings in terms of speaker identities and their locations. This is achieved by segmenting the audio recordings using two independent sources of information : magnitude spectrum analysis an ...

2004

Parametric coding of spatial audio

Christof Faller

A wide range of techniques for coding a single speech or audio signal channel have been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bitrate. O ...

EPFL2004

Clustering And Segmenting Speakers And Their Locations In Meetings

Guillaume Lathoud, Jitendra Ajmera

IDIAP2003

The VidTIMIT Database

This communication describes the multi-modal VidTIMIT database, which can be useful for research involving mono- or multi-modal speech recognition or person authentication. It is comprised of video and corresponding audio recordings of 43 volunteers, recit ...

IDIAP2002

Reproduction couleur par trames irrégulières et semi-régulières

In the printing industry, one of the most common methods for reproducing halftone images using bilevel printing devices is clustered-dot ordered dithering. The images produced using this method are quite faithful to the original and are visually pleasing. ...

EPFL1995