Template-matching for text-dependent speaker verification

In the last decade, i-vector and Joint Factor Analysis (JFA) approaches to speaker modeling have become ubiquitous in the area of automatic speaker recognition. Both of these techniques involve the computation of posterior probabilities, using either Gaussian Mixture Models (GMM) or Deep Neural Networks (DNN), as a prior step to estimating i-vectors or speaker factors. GMMs focus on implicitly modeling phonetic information of acoustic features while DNNs focus on explicitly modeling phonetic/linguistic units. For text-dependent speaker verification, DNN-based systems have considerably outperformed GMM for fixed-phrase tasks. However, both approaches ignore phone sequence information. In this paper, we aim at exploiting this information by using Dynamic Time Warping (DTW) with speaker-informative features. These features are obtained from i-vector models extracted over short speech segments, also called online i-vectors. Probabilistic Linear Discriminant Analysis (PLDA) is further used to project online i-vectors onto a speaker-discriminative subspace. The proposed DTW approach obtained at least 74% relative improvement in equal error rate on the RSR corpus over other state-of-the-art approaches, including i-vector and JFA.

Template-matching for text-dependent speaker verification

Graph Chatbot

Chattez avec Graph Search

Sparse Autoencoders for Speech Modeling and Recognition

Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels-Increasing Safety While Reducing Air Traffic Controllers' Workload

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels-Increasing Safety While Reducing Air Traffic Controllers' Workload

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Sparse Autoencoders for Speech Modeling and Recognition