Publication

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Résumé

Speech signal conveys several kinds of information such as a message, speaker identity, emotional state of the speaker and social state of the speaker. Automatic speech assessment is a broad area that refers to using automatic methods to predict human judgements regarding different kinds of information conveyed in speech, such as intelligibility of the spoken message, dialect and fluency of the speaker. Unlike other speech technology areas, such as automatic speech recognition, text-to-speech synthesis and automatic speaker recognition, automatic speech assessment is an emerging direction of research. One of the challenges in this field is that there is no single method or framework that scales across diverse speech assessment tasks. Thus, this thesis takes a broader outlook and focuses on prior knowledge incorporation for diverse data-driven speech assessment problems.First, we focus on the development of end-to-end acoustic modelling methods for non-verbal cue-based speech assessment. More precisely, we develop neural network-based methods that can integrate prior knowledge about speech production to learn to assess speech from raw waveform. We validate the developed methods through investigations on several speech assessment tasks, viz. dialect identification, depression detection and speech fluency rating prediction.Second, we focus on advancing a recently proposed phone posterior feature-based intelligibility estimation technique. Specifically, to enhance phone posterior probability estimation, we propose two novel approaches to incorporate linguistic segment level knowledge during the training of neural networks through estimation of confidence measures. We validate the two proposed approaches through automatic speech recognition and dysarthric speech intelligibility assessment studies.Finally, in the context of privacy preservation, we develop a signal processing-based speech pseudonymization approach that alters voice source information and vocal tract system information based on prior knowledge to obfuscate the speaker identity, while retaining intelligibility, i.e. the phones and words remain recognizable. We validate the proposed pseudonymization approach through listening experiments and automatic evaluations.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.