Non-linear Spectral Contrast Stretching for In-car Speech Recognition
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two ...
Class-specific classifiers for audio, visual and audio-visual speech recognition systems are developed and compared with traditional classifiers. We use state- of-the-art feature extraction methods and show the benefits of a class-specific classifier on ea ...
Humans perceive their surrounding environment in a multimodal manner by using multi-sensory inputs combined in a coordinated way. Various studies in psychology and cognitive science indicate the multimodal nature of human speech production and perception. ...
Multilingual speech recognition obviously involves numerous research challenges, including common phoneme sets, adaptation on limited amount of training data, as well as mixed language recognition (common in many countries, like Switzerland). In this latte ...
In this paper, we present a novel feature normalization method in the log-scaled spectral domain for improving the noise robustness of speech recognition front-ends. In the proposed scheme, a non-linear contrast stretching is added to the outputs of log me ...
Multilingual speech recognition obviously involves numerous research challenges, including common phoneme sets, adaptation on limited amount of training data, as well as mixed language recognition (common in many countries, like Switzerland). In this latte ...
We address issues for improving hands-free speech recognition performance in the presence of multiple simultaneous speakers using multiple distant microphones. In this paper, a log spectral mapping is proposed to estimate the log mel-filterbank outputs of ...
Speaker detection is an important component of a speech-based user interface. Audiovisual speaker detection, speech and speaker recognition or speech synthesis for example find multiple applications in human-computer interaction, multimedia content indexin ...
We present a feature selection method based on information theoretic measures, targeted at multimodal signal processing, showing how we can quantitatively assess the relevance of features from different modalities. We are able to find the features with the ...
Most state-of-the-art automatic speech recognition (ASR) systems deal with noise in the environment by extracting noise robust features which are subsequently modelled by a Hidden Markov Model (HMM). A limitation of this feature-based approach is that the ...