Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Subrahmanya Pavankumar Dubagunta
2021
Thèse EPFL

Résumé

Speech signal conveys several kinds of information such as a message, speaker identity, emotional state of the speaker and social state of the speaker. Automatic speech assessment is a broad area that refers to using automatic methods to predict human judgements regarding different kinds of information conveyed in speech, such as intelligibility of the spoken message, dialect and fluency of the speaker. Unlike other speech technology areas, such as automatic speech recognition, text-to-speech synthesis and automatic speaker recognition, automatic speech assessment is an emerging direction of research. One of the challenges in this field is that there is no single method or framework that scales across diverse speech assessment tasks. Thus, this thesis takes a broader outlook and focuses on prior knowledge incorporation for diverse data-driven speech assessment problems.First, we focus on the development of end-to-end acoustic modelling methods for non-verbal cue-based speech assessment. More precisely, we develop neural network-based methods that can integrate prior knowledge about speech production to learn to assess speech from raw waveform. We validate the developed methods through investigations on several speech assessment tasks, viz. dialect identification, depression detection and speech fluency rating prediction.Second, we focus on advancing a recently proposed phone posterior feature-based intelligibility estimation technique. Specifically, to enhance phone posterior probability estimation, we propose two novel approaches to incorporate linguistic segment level knowledge during the training of neural networks through estimation of confidence measures. We validate the two proposed approaches through automatic speech recognition and dysarthric speech intelligibility assessment studies.Finally, in the context of privacy preservation, we develop a signal processing-based speech pseudonymization approach that alters voice source information and vocal tract system information based on prior knowledge to obfuscate the speaker identity, while retaining intelligibility, i.e. the phones and words remain recognizable. We validate the proposed pseudonymization approach through listening experiments and automatic evaluations.

Source officielle

https://infoscience.epfl.ch/record/288398?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Graph Chatbot

Chattez avec Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Task-driven neural network models predict neural dynamics of proprioception: Experimental data, activations and predictions of neural network models

Machine Learning for Modeling Stock Returns

Task-driven neural network models predict neural dynamics of proprioception: Experimental data, activations and predictions of neural network models

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Machine Learning for Modeling Stock Returns