Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
This paper introduces a novel approach for extracting speaker embeddings from audio mixtures of multiple overlapping voices. This approach is based on a multi-task neural network. The network first extracts a latent feature for each direction. This feature is used for detecting sound sources as well as identifying speakers. In contrast to traditional approaches, the proposed method does not rely on explicit sound source separation. The neural network model learns from data to extract the most suitable features of the sounds at different directions. The experiments using audio recordings of overlapping sound sources show that the proposed approach outperforms a beamforming-based traditional method.
Wulfram Gerstner, Stanislaw Andrzej Wozniak, Ana Stanojevic, Giovanni Cherubini, Angeliki Pantazi