On matching data and model in LF-MMI-based dysarthric speech recognition

Enno Hermann
2023
Thèse EPFL

Résumé

In light of steady progress in machine learning, automatic speech recognition (ASR) is entering more and more areas of our daily life, but people with dysarthria and other speech pathologies are left behind. Their voices are underrepresented in the training data and so different from typical speech that ASR systems fail to recognise them. This thesis aims to adapt both acoustic models and training data of ASR systems in order to better handle dysarthric speech.We first build state-of-the-art acoustic models based on sequence-discriminative lattice-free maximum mutual information (LF-MMI) training that serve as baselines for the following experiments. We propose the dynamic combination of models trained on either control, dysarthric, or both groups of speakers to address the acoustic variability of dysarthric speech. Furthermore, we combine models trained with either phoneme or grapheme acoustic units in order to implicitly handle pronunciation variants.Second, we develop a framework to analyse the acoustic space of ASR training data and its discriminability. We observe that these discriminability measures are strongly linked to subjective intelligibility ratings of dysarthric speakers and ASR performance.Finally, we compare a range of data augmentation methods, including voice conversion and speech synthesis, for creating artificial dysarthric training data for ASR systems. With our analysis framework, we find that these methods reproduce characteristics of natural dysarthric speech.

Source officielle

https://infoscience.epfl.ch/record/303171?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

On matching data and model in LF-MMI-based dysarthric speech recognition

Graph Chatbot

Chattez avec Graph Search

Training a Filter-Based Model of the Cochlea in the Context of Pre-Trained Acoustic Models

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Training a Filter-Based Model of the Cochlea in the Context of Pre-Trained Acoustic Models

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech