Phonetic aware techniques for Speaker Verification
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial st ...
Although current trends in speech processing consider deep learning through data-driven technologies, many potential applications exhibit lack of training or development data. Therefore, considerably light signal processing techniques are still of interest ...
Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...
This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that ...
State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ...
The performance of speaker recognition systems has considerably improved in the last decade. This is mainly due to the development of Gaussian mixture model-based systems and in particular to the use of i-vectors. These systems handle relatively well noise ...
Recognising dysarthric speech is a challenging problem as it differs in many aspects from typical speech, such as speaking rate and pronunciation. In the literature the focus so far has largely been on handling these variabilities in the framework of HMM/G ...
The recent increase in social media based propaganda, i.e., ‘fake news’, calls for automated methods to detect tampered content. In this paper, we focus on detecting tampering in a video with a person speaking to a camera. This form of manipulation is easy ...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that can work with very little adaptation data. It is also well known that VTLN can be cast as a linear transform in the cepstral domain. Building on this latter ...
Modeling directly raw waveforms through neural networks for speech processing is gaining more and more attention. Despite its varied success, a question that remains is: what kind of information are such neural networks capturing or learning for different ...