Synthetic References for Template-based ASR using Posterior Features

Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investigate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility.

Synthetic References for Template-based ASR using Posterior Features

Graph Chatbot

Chat with Graph Search

Extensions of Peer Prediction Incentive Mechanisms

Learning the intrinsic dynamics of spatio-temporal processes through Latent Dynamics Networks

Reduced Training Data for Laser Ultrasound Signal Interpretation by Neural Networks

Learning the intrinsic dynamics of spatio-temporal processes through Latent Dynamics Networks

Reduced Training Data for Laser Ultrasound Signal Interpretation by Neural Networks

Extensions of Peer Prediction Incentive Mechanisms