Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Julian David Fritsch
2023
Thèse EPFL

Résumé

Atypical aspects in speech concern speech that deviates from what is commonly considered normal or healthy. In this thesis, we propose novel methods for detection and analysis of these aspects, e.g. to monitor the temporary state of a speaker, diseases that manifest in speech, or people that have trouble producing speech. To overcome data scarcity, most methods in this thesis depend on auxiliary resources; to comply with clinicians, prior knowledge and explainability are taken into account.In the first part of this thesis, we augment methods that aim to directly assess atypical speech with convolutional neural networks (CNN). With the goal of inducing prior knowledge about atypical speech into CNNs, we present findings in the context of Alzheimer's disease detection and severity estimation: We demonstrate that filtering the waveforms to focus on voice-source-related frequencies and increasing the input segment length to capture prosody has beneficial effects. Additionally, we explore incorporating phonetic knowledge into CNNs: By using CNN-based models trained for articulation prediction that are fine-tuned on continuous sleepiness estimation. Furthermore, we propose methods for detecting and estimating breathing impairments in people with Parkinson's disease. We compare hand-crafted features that model voice-source information and embeddings extracted from CNNs and find they are well-suited.The second part of this thesis presents a novel method for intelligibility assessment of people with dysarthria. Intelligibility is a clinical measure of the severity of dysarthria. Typically assessed as an aggregate over a set of utterances by a speaker, we emulate the subjective listening tests by performing utterance verification using phonetic features on all of a speaker's utterances, aggregate them into the speaker's intelligibility score, and demonstrate this scheme's robustness through several variations. The same scheme was applied to emulate a human listening test, where listeners had to differentiate between before and after lip filler surgery. The intelligibility assessment scheme is extended into pronunciation feedback: Expected pronunciation is modeled by training one hidden Markov model per phoneme on healthy speech. Given a prompt and its corresponding dysarthric utterance, we can estimate by how much a phoneme deviates from its expected pronunciation and give a phoneme-level assessment.

Source officielle

https://infoscience.epfl.ch/record/302486?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Graph Chatbot

Chattez avec Graph Search

Training a Filter-Based Model of the Cochlea in the Context of Pre-Trained Acoustic Models

Probing and modulating inter-areal coupling in the cortical visual motion processing pathway with non-invasive brain stimulation

Intraday solar irradiance forecasting using public cameras

Training a Filter-Based Model of the Cochlea in the Context of Pre-Trained Acoustic Models

Probing and modulating inter-areal coupling in the cortical visual motion processing pathway with non-invasive brain stimulation

Intraday solar irradiance forecasting using public cameras