Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investigate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility.
Jean-Pierre Hubaux, Juan Ramón Troncoso-Pastoriza, Sylvain Chatel, Apostolos Pyrgelis