Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Change in voice quality (VQ) is one of the first precursors of Parkinson's disease (PD). Specifically, impacted phonation and articulation causes the patient to have a breathy, husky-semiwhisper and hoarse voice. A goal of this paper is to characterize a VQ spectrum -- the composition of non-modal phonations -- of voice in PD. The paper relates non-modal healthy phonations: breathy, creaky, tense, falsetto and harsh, with disordered phonation in PD. First, statistics are learned to differentiate the modal and non-modal phonations. Statistics are computed using phonological posteriors, the probabilities of phonological features inferred from the speech signal using a deep learning approach. Second, statistics of disordered speech are learned from PD speech data comprising 50 patients and 50 healthy controls. Third, Euclidean distance is used to calculate similarity of non-modal and disordered statistics, and the inverse of the distances is used to obtain the composition of non-modal phonation in PD. Thus, pathological voice quality is characterised using healthy non-modal voice quality ``base/eigenspace''. The obtained results are interpreted as the voice of an average patient with PD and can be characterised by the voice quality spectrum composed of 30% breathy voice, 23% creaky voice, 20% tense voice, 15% falsetto voice and 12% harsh voice. In addition, the proposed features were applied for prediction of the dysarthria level according to the Frenchay assessment score related to the larynx, and significant improvement is obtained for reading speech task. The proposed characterisation of VQ might also be applied to other kinds of pathological speech.