Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Hervé Bourlard, Afsaneh Asaei, Pranay Dighe
2019
Article

Résumé

Towards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are typically very high-dimensional, recent studies have shown that the true class information is actually embedded in low-dimensional subspaces. Thus, a matrix of all posteriors belonging to a particular senone class is expected to have a very low rank. In this paper, we exploit Principal Component Analysis and Compressive Sensing based dictionary learning for low-rank and sparse modeling of senone subspaces respectively. Our hypothesis is that the principal components of DNN posterior space (termed as eigen-posteriors in this work) and Compressive Sensing dictionaries can act as suitable models to extract the well-structured low dimensional latent information and discard the undesirable high-dimensional unstructured noise present in the posteriors. Enhanced DNN posteriors thus obtained are used as soft targets for training better acoustic models to improve ASR. In this context, our approach also enables improving distant speech recognition by mapping far-field acoustic features to low-dimensional senone subspaces learned from near-field features. Experiments are performed on AMI Meeting corpus in both close-talk (IHM) and far-field (SDM) microphone settings where acoustic models trained using enhanced DNN posteriors outperform the conventional hard target based hybrid DNN-HMM systems. An information theoretic analysis is also presented to show how low-rank and sparse enhancement modify the DNN posterior space to better match the assumptions of hidden Markov model (HMM) backend.

Source officielle

https://infoscience.epfl.ch/record/268000?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Graph Chatbot

Chattez avec Graph Search

Statistical Inference for Inverse Problems: From Sparsity-Based Methods to Neural Networks

Mapping Bibliotheca Hertziana

Sparse Autoencoders for Speech Modeling and Recognition

Sparse Autoencoders for Speech Modeling and Recognition

Statistical Inference for Inverse Problems: From Sparsity-Based Methods to Neural Networks

Mapping Bibliotheca Hertziana