Publication

EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION

Petr Motlicek, Subhadeep Dey
2015
Rapport ou document de travail

Résumé

This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension to the basic SGMM framework allows to robustly estimate low-dimensional speaker vectors and exploit them for speaker adaptation. We propose a speaker verification framework based on low-dimensional speaker vectors estimated using SGMMs, trained in ASR manner using manual transcriptions. To test the robustness of the system, we evaluate the proposed approach with respect to the state-of-the-art i-vector extractor on the NIST SRE 2010 evaluation set and on four different length-utterance conditions: 3sec-10sec, 10 sec-30 sec, 30 sec-60 sec and full (untruncated) utterances. Experimental results reveal that while i-vector system performs better on truncated 3sec to 10sec and 10 sec to 30 sec utterances, noticeable improvements are observed with SGMMs especially on full length-utterance durations. Eventually, the proposed SGMM approach exhibits complementary properties and can thus be efficiently fused with i-vector based speaker verification system.

Source officielle

https://infoscience.epfl.ch/record/209103?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION

Graph Chatbot

Chattez avec Graph Search

Sparse Autoencoders for Speech Modeling and Recognition

Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels-Increasing Safety While Reducing Air Traffic Controllers' Workload

Fair Voice Biometrics: Impact of Demographic Imbalance on Group Fairness in Speaker Recognition

Sparse Autoencoders for Speech Modeling and Recognition

Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels-Increasing Safety While Reducing Air Traffic Controllers' Workload

Fair Voice Biometrics: Impact of Demographic Imbalance on Group Fairness in Speaker Recognition