Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints.They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system.The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

Graph Chatbot

Chattez avec Graph Search

Text as a Richer Source of Supervision in Semantic Segmentation Tasks

TempSAL - Uncovering Temporal Information for Deep Saliency Prediction

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

Text as a Richer Source of Supervision in Semantic Segmentation Tasks

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

TempSAL - Uncovering Temporal Information for Deep Saliency Prediction