In communication systems, it is crucial to estimate the perceived quality of audio and speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA, which are intrusive methods. This restricts the possibilities of using these metrics i ...
In hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial st ...
Speech is the most natural means of communication for humans. Therefore, since the beginning of computers it has been a goal to interact with machines via speech. While there have been gradual improvements in this field over the decades, and with recent dr ...
The goal of this thesis is to improve current state-of-the-art techniques in speaker verification
(SV), typically based on â identity-vectorsâ (i-vectors) and deep neural network (DNN), by exploiting diverse (phonetic) information extracted using variou ...
In the literature, the task of dysarthric speech intelligibility assessment has been approached through development of different low-level feature representations, subspace modeling, phone confidence estimation or measurement of automatic speech recognitio ...
IEEE Institute of Electrical and Electronics Engineers2021
We propose a simultaneous noise filtering and phase unwrapping algorithm. Spatial evolution of phase is modeled as an autoregressive Gaussian Markov random field. Accordingly, phase value at a pixel is related to phase values at surrounding pixels in a pro ...
Competitive state-of-the-art automatic pathological speech intelligibility measures typically rely on regression training on a large number of features, require a large amount of healthy speech training data, or are applicable only to phonetically balanced ...
A new software for modeling pathological speech signals is presented in this paper. The software is called NeuroSpeech. This software enables the analysis of pathological speech signals considering different speech dimensions: phonation, articulation, pros ...
Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should ...
Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...