Publication

A multimodal approach to extract optimized audio features for speaker detection

Jean-Philippe Thiran, Murat Kunt, Torsten Butz, Patricia Besson
2005
Conference paper

Abstract

We present a method that exploits an information theoretic framework to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the active speaker among different candidates. The results show that our method is able to exploit the shared speech information contained in audio and video signals to recover their common source.

Official source

https://infoscience.epfl.ch/record/87276?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

A multimodal approach to extract optimized audio features for speaker detection

Graph Chatbot

Chat with Graph Search

A Functional Perspective on Information Measures

AI-based telepresence for broadcast applications

Mutual Information Disentangles Interactions from Changing Environments

A Functional Perspective on Information Measures

Mutual Information Disentangles Interactions from Changing Environments

AI-based telepresence for broadcast applications