Error Correcting Posterior Combination for Robust Multi-Band Speech Recognition
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Accurate detection and segmentation of spontaneous multi-party speech is crucial for a variety of applications, including speech acquisition and recognition, as well as higher-level event recognition. However, the highly sporadic nature of spontaneous spee ...
In this report, we propose a discriminative decoder for phoneme recognition, i.e. the identification of the uttered phoneme sequence from a speech recording. This task is solved as a 3 step process: a phoneme classifier first classifies each accoustic fram ...
Standard ASR systems typically use phoneme as the subword units. Preliminary studies have shown that the performance of the ASR system could be improved by using grapheme as additional subword units. In this paper, we investigate such a system where the wo ...
This report describes the handwriting recognition demo on Linux and Windows. In the demo, we develop a prototype system of office management that provides pen-driven access to IDIAP people information, such as names, telephone numbers and nationalities thr ...
Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with auxiliary information could improve the performance of the system. ...
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition pe ...
In this article, we compare aural and automatic speaker recognition in the context of forensic analyses, using a Bayesian framework for the interpretation of evidence. We use perceptual tests performed by non-experts and compare their performance with that ...
Standard ASR systems typically use phoneme as the subword units. Preliminary studies have shown that the performance of the ASR system could be improved by using grapheme as additional subword units. In this paper, we investigate such a system where the wo ...
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition pe ...
Using a low-level representation of images, like matching pursuit, we introduce a new way of describing objects through a general description using a translation, rotation, and isotropic scale invariant dictionary of basis functions. We then use this descr ...