Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Many pathologies cause impairments in the speech production mechanism resulting in reduced speech intelligibility and communicative ability. To assist the clinical diagnosis, treatment and management of speech disorders, automatic pathological speech assessments are indispensable. Such automatic assessments provide reliable, objective, and cost-effective assessment in contrast to subjective and time-consuming auditory-perceptual analyses performed by clinicians.Among crucial automatic analyses for developing potential computer-aided tools are speech pathology detection, i.e., discriminating between normal and pathological speech, and speech intelligibility assessment, i.e., predicting an intelligibility index correlated with the percentage of words correctly understood by human listeners. The goal of this thesis is to propose novel data-driven approaches to aid the development of a clinical assistive tool for automatic pathological speech assessment with two purposes, i.e., pathological speech detection and intelligibility assessment.First, we focus on the development of novel machine learning approaches to address the pathological speech detection task. Motivated by the clinical evidence on spectro-temporal distortions associated with pathological speech, we propose a subspace-based speech pathology detection approach that relies on analyzing subspaces spanned by the dominant spectral or temporal patterns of speech.Although the temporal subspace-based approach yields a high performance, it requires time-alignment and having access to phonetically-balanced utterances from all speakers. To avoid the time-alignment and also to assess the efficacy of deep learning approaches for such a task, we propose analyzing pairwise distance matrices computed from speech representations using convolutional neural networks.Furthermore, to be able to achieve pathological speech detection without requiring constraints on the phonetic content, we propose different supervised representation learning approaches using convolutional neural networks to learn robust and relevant feature representations. We demonstrate the effectiveness of the proposed approaches through different experiments across different databases.Second, we focus on developing reliable automatic pathological speech intelligibility measures overcoming several drawbacks of the state-of-the-art measures while outperforming them. We first propose a measure based on short-time objective intelligibility assessment.Further, we provide a solution to ensure its applicability across scenarios with different phonetic content across speakers. We also propose intelligibility measures based on analyzing speech subspaces. The subspace-based intelligibility measures are applicable to different scenarios while overcoming the drawbacks of the previously described measure.We validate the performance of the proposed measures across languages and diseases.Finally, insights are provided on a potential clinical assistive tool for pathological speech detection and intelligibility assessment. To this end, we jointly validate the applicability of two of the previously described approaches, i.e., temporal subspace-based speech pathology detection and short-time objective intelligibility assessment. As our approaches for both tasks achieve a high performance independently of the language and disease, we confirm the possibility of developing such a multi-purpose clinical assistive tool.
Devis Tuia, Benjamin Alexander Kellenberger, Marc Conrad Russwurm