Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Recognising dysarthric speech is a challenging problem as it differs in many aspects from typical speech, such as speaking rate and pronunciation. In the literature the focus so far has largely been on handling these variabilities in the framework of HMM/GMM and cross-entropy based HMM/DNN systems. This paper focuses on the use of state-of-the-art sequence-discriminative training, in particular lattice-free maximum mutual information (LF-MMI), for improving dysarthric speech recognition. Through a systematic investigation on the Torgo corpus we demonstrate that LF-MMI performs well on such atypical data and compensates much better for the low speaking rates of dysarthric speakers than conventionally trained systems. This can be attributed to inherent aspects of current speech recognition training regimes, like frame subsampling and speed perturbation, which obviate the need for some techniques previously adopted specifically for dysarthric speech.