On the Adequacy of Baseform Pronunciations and Pronunciation Variants

Hervé Bourlard
2004
Rapport ou document de travail

Résumé

This paper presents an approach to automatically extract and evaluate the stability'' of pronunciation variants (i.e., adequacy of the model to accommodate this variability), based on multiple pronunciations of each lexicon words and the knowledge of a reference baseform pronunciation. Most approaches toward modelling pronunciation variability in speech recognition are based on the inference (through an ergodic HMM model) of a pronunciation graph (including all pronunciation variants), usually followed by a smoothing (e.g., Bayesian) of the resulting graph. Compared to these approaches, the approach presented here differs by (1) the way the models are inferred and (2) the way the smoothing (i.e., keeping the best ones) is done. In our case, indeed, inference of the pronunciation variants is obtained by slowly relaxing'' a (usually left-to-right) baseform model towards a fully ergodic model. In this case, the more stable the model is, the less the inferred model will diverge from it. Hence, for each pronunciation model so generated, we evaluate their adequacy by calculating the Levenshtein distance of the the new model with respect to the baseform, as well as their confidence measure (based on some posterior estimation), and models with the lowest Levenshtein distance and highest confidence are preserved. On a large telephone speech database (Phonebook), we show the relationship between this ``stability'' measure and recognition performance, and we finally show that automatically adding a few pronunciation variants to the less stable words is enough to significantly improve recognition rates.

Source officielle

https://infoscience.epfl.ch/record/83063?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

On the Adequacy of Baseform Pronunciations and Pronunciation Variants

Graph Chatbot

Chattez avec Graph Search

The Power of Two Matrices in Spectral Algorithms for Community Recovery

Equivariant Neural Architectures for Representing and Generating Graphs

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

The Power of Two Matrices in Spectral Algorithms for Community Recovery

Equivariant Neural Architectures for Representing and Generating Graphs

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators