Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.
A method that exploits an information theoretic framework to extract optimized audio features using video information is presented. A simple measure of mutual information (MI) between the resulting audio and video features allows the detection of the active speaker among different candidates. This method involves the optimization of an MI-based objective function. No approximation is needed to solve this optimization problem, neither for the estimation of the probability density functions (pdf) of the features, nor for the cost function itself. The pdf are estimated from the samples using a non-parametric approach. The challenging optimization problem is solved using a global method: the Differential Evolution algorithm. Two information theoretic optimization criteria are compared and their ability to extract audio features specific to speech is discussed. Using these specific speech audio features, candidates video features are then classified as membership of the "speaker" or "non-speaker" class, resulting in a speaker detection scheme. As a result, our method achieves a speaker detection rate of 100% on home- grown test sequences, and of 85% on most commonly used sequences.
Chargement
Chargement
Chargement
Chargement
Chargement
Patricia Besson, Torsten Butz, Murat Kunt, Jean-Philippe Thiran
Ivana Arsic de Heras Ciechomska, Ninoslav Marina, Jean-Philippe Thiran
Patricia Besson, Murat Kunt, Vlad Popovici, Jean-Philippe Thiran, Jean-Marc Vesin