Building Blocks of Assistant Based Speech Recognition for Air Traffic Management Applications
Related publications (33)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that ...
In hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial st ...
The goal of this thesis is to improve current state-of-the-art techniques in speaker verification
(SV), typically based on âidentity-vectorsâ (i-vectors) and deep neural network (DNN), by exploiting diverse (phonetic) information extracted using variou ...
Modeling directly raw waveform through neural networks for speech processing is gaining more and more attention. Despite its varied success, a question that remains is: what kind of information are such neural networks capturing or learning for different t ...
The speech signal conveys information on different time scales from short (20--40 ms) time scale or segmental, associated to phonological and phonetic information to long (150--250 ms) time scale or supra segmental, associated to syllabic and prosodic info ...
Air Navigation Service Provider (ANSPs) replace paper flight strips through different digital solutions. The instructed commands from an air traffic controller (ATCOs) are then available in computer readable form. However, those systems require manual cont ...
The speech signal conveys information on different time scales from short (20--40 ms) time scale or segmental, associated to phonological and phonetic information to long (150--250 ms) time scale or supra segmental, associated to syllabic and prosodic info ...
The speech signal conveys information on different time scales from short (20–40 ms) time scale or segmental, associated to phonological and phonetic information to long (150–250 ms) time scale or supra segmental, associated to syllabic and prosodic inform ...
This thesis deals with signal-based methods that predict how listeners perceive speech quality in telecommunications. Such tools, called objective quality measures, are of great interest in the telecommunications industry to evaluate how new or deployed sy ...
Speaker diarization is the task of identifying ``who spoke when'' in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization sys ...