Publication

Associating Audio-Visual Activity Cues in a Dominance Estimation Framework

Daniel Gatica-Perez, Yan Huang
2008
Article de conférence
Résumé

We address the problem of both estimating the dominant person in a meeting from a single audio source and identifying them visually in a multi-camera setting. We use a speaker diarization algorithm to perform speaker segmentation and clustering, representing when they spoke. Using a greedy ordered audio-visual association algorithm, we investigate using the speaker clusters to find the corresponding person in one of the video channels. The difficulty of the problem is that firstly the speaker diarization output is noisy (e.g. for participants who speak little) and often produces an unequal number of clusters to true participants. Secondly, personal visual activity from natural upper torso motion, which can include highly deformable pose changes and perspective distortion, is computed through computationally efficient coarse features. Our results using almost 2 hours of audio-visual data from 4-participant meetings show a strong correlation between the estimated speaker diarization and visual activity features, enabling the identification of the most dominant person as a pair of audio-visual channels.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (31)
Corrélation (statistiques)
En probabilités et en statistique, la corrélation entre plusieurs variables aléatoires ou statistiques est une notion de liaison qui contredit leur indépendance. Cette corrélation est très souvent réduite à la corrélation linéaire entre variables quantitatives, c’est-à-dire l’ajustement d’une variable par rapport à l’autre par une relation affine obtenue par régression linéaire. Pour cela, on calcule un coefficient de corrélation linéaire, quotient de leur covariance par le produit de leurs écarts types.
Pearson correlation coefficient
In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations.
Intraclass correlation
In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures, it operates on data structured as groups rather than data structured as paired observations.
Afficher plus
Publications associées (102)

The hindbrain and cortico-reticular pathway in adolescent idiopathic scoliosis

Bénédicte Marie Maréchal

AIM: To characterise the corticoreticular pathway (CRP) in a case -control cohort of adolescent idiopathic scoliosis (AIS) patients using high -resolution slice -accelerated readoutsegmented echo -planar diffusion tensor imaging (DTI) to enhance the discri ...
W B Saunders Co Ltd2024

Saliency prediction in 360° architectural scenes: Performance and impact of daylight variations

Marilyne Andersen, Sabine Süsstrunk, Caroline Karmann, Bahar Aydemir, Kynthia Chamilothori, Seungryong Kim

Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the pe ...
2023

Towards a multiscale point cloud structural similarity metric

Touradj Ebrahimi

Point clouds are effective data structures for the rep- resentation of three-dimensional media and hence adopted in a wide range of practical applications. In many cases, the portrayed data is expected to be visualized by humans. After acquisition, point c ...
2023
Afficher plus
MOOCs associés (6)
Digital Signal Processing I
Basic signal processing concepts, Fourier analysis and filters. This module can be used as a starting point or a basic refresher in elementary DSP
Digital Signal Processing II
Adaptive signal processing, A/D and D/A. This module provides the basic tools for adaptive filtering and a solid mathematical framework for sampling and quantization
Digital Signal Processing III
Advanced topics: this module covers real-time audio processing (with examples on a hardware board), image processing and communication system design.
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.