Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Julian David Fritsch
2023
EPFL thesis

Abstract

Atypical aspects in speech concern speech that deviates from what is commonly considered normal or healthy. In this thesis, we propose novel methods for detection and analysis of these aspects, e.g. to monitor the temporary state of a speaker, diseases that manifest in speech, or people that have trouble producing speech. To overcome data scarcity, most methods in this thesis depend on auxiliary resources; to comply with clinicians, prior knowledge and explainability are taken into account.In the first part of this thesis, we augment methods that aim to directly assess atypical speech with convolutional neural networks (CNN). With the goal of inducing prior knowledge about atypical speech into CNNs, we present findings in the context of Alzheimer's disease detection and severity estimation: We demonstrate that filtering the waveforms to focus on voice-source-related frequencies and increasing the input segment length to capture prosody has beneficial effects. Additionally, we explore incorporating phonetic knowledge into CNNs: By using CNN-based models trained for articulation prediction that are fine-tuned on continuous sleepiness estimation. Furthermore, we propose methods for detecting and estimating breathing impairments in people with Parkinson's disease. We compare hand-crafted features that model voice-source information and embeddings extracted from CNNs and find they are well-suited.The second part of this thesis presents a novel method for intelligibility assessment of people with dysarthria. Intelligibility is a clinical measure of the severity of dysarthria. Typically assessed as an aggregate over a set of utterances by a speaker, we emulate the subjective listening tests by performing utterance verification using phonetic features on all of a speaker's utterances, aggregate them into the speaker's intelligibility score, and demonstrate this scheme's robustness through several variations. The same scheme was applied to emulate a human listening test, where listeners had to differentiate between before and after lip filler surgery. The intelligibility assessment scheme is extended into pronunciation feedback: Expected pronunciation is modeled by training one hidden Markov model per phoneme on healthy speech. Given a prompt and its corresponding dysarthric utterance, we can estimate by how much a phoneme deviates from its expected pronunciation and give a phoneme-level assessment.

Official source

https://infoscience.epfl.ch/record/302486?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Graph Chatbot

Chat with Graph Search

Sparse Autoencoders for Speech Modeling and Recognition

On matching data and model in LF-MMI-based dysarthric speech recognition

Automatic pathological speech assessment

Sparse Autoencoders for Speech Modeling and Recognition

On matching data and model in LF-MMI-based dysarthric speech recognition

Automatic pathological speech assessment