Segment-level training of ANNs based on acoustic confidence measures for hybrid HMM/ANN Speech Recognition

We show that confidence measures estimated from local posterior probabilities can serve as objective functions for training ANNs in hybrid HMM based speech recognition systems. This leads to a segment-level training paradigm that overcomes the limitation of frame-level updates ignoring the sequence structure in speech. We propose measures that train at the state and phone segment levels, while still decoding in the conventional framework. Experimental results on multiple corpora show that such trainings not only yield better systems in terms of performance, but also give additional improvements with sequence discriminative training. These techniques generalise across front-ends and model architectures, and efficiently handle the effect of segment duration variations on the ANN training.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Segment-level training of ANNs based on acoustic confidence measures for hybrid HMM/ANN Speech Recognition

Graph Chatbot

Chat with Graph Search

Transformer Models for Vision

Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?

Controllability and Interpretability in Affective Speech Synthesis

Transformer Models for Vision

Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?

Controllability and Interpretability in Affective Speech Synthesis