Spectro-Temporal Activity Pattern (STAP) Features for Noise Robust ASR
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech recognition. Up to ...
EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP2021
Speech signal conveys several kinds of information such as a message, speaker identity, emotional state of the speaker and social state of the speaker. Automatic speech assessment is a broad area that refers to using automatic methods to predict human judg ...
Voice communication is the main channel to exchange information between pilots and Air-Traffic Controllers (ATCos). Recently, several projects have explored the employment of speech recognition technology to automatically extract spoken key information suc ...
Matching of a test signal to a reference word hypothesis forms the core of many speech processing problems, including objective speech intelligibility assessment. This paper first shows that the comparison of two speech signals can be formulated as matchin ...
Functional time series is a temporally ordered sequence of not necessarily independent random curves. While the statistical analysis of such data has been traditionally carried out under the assumption of completely observed functional data, it may well ha ...
In communication systems, it is crucial to estimate the perceived quality of audio and speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA, which are intrusive methods. This restricts the possibilities of using these metrics i ...
Sampling has always been at the heart of signal processing providing a bridge between the analogue world and discrete representations of it, as our ability to process data in continuous space is quite limited. Furthermore, sampling plays a key part in unde ...
Auditory perception is an essential part of a robotic system in Human-Robot Interaction (HRI), and creating an artificial auditory perception system that is on par with human has been a long-standing goal for researchers. In fact, this is a challenging res ...
Respiration is an essential and primary mechanism for speech production. We first inhale and then produce speech while exhaling. When we run out of breath, we stop speaking and inhale. Though this process is involuntary, speech production involves a system ...
Stochastic Maximum Likelihood (SML) is a popular direction of arrival (DOA) estimation technique in array signal processing. It is a parametric method that jointly estimates signal and instrument noise by maximum likelihood, achieving excellent statistical ...