Detection and Recognition of Number Sequences in Spoken Utterances
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Modern speech recognition has many ways of quantifying the misrecognitions a speech recognizer makes. The errors in modern speech recognition makes extensive use of the Levenshtein algorithm to find the distance between the labeled target and the recognize ...
Visual behavior recognition is currently a highly active research area. This is due both to the scientific challenge posed by the complexity of the task, and to the growing interest in its applications, such as automated visual surveillance, human-computer ...
Humans perceive their surrounding environment in a multimodal manner by using multi-sensory inputs combined in a coordinated way. Various studies in psychology and cognitive science indicate the multimodal nature of human speech production and perception. ...
Using phone posterior probabilities has been increasingly explored for improving automatic speech recognition (ASR) systems. In this paper, we propose two approaches for hierarchically enhancing these phone posteriors, by integrating long acoustic context, ...
Recent years have seen an increasing interest in sparseness constraints for image classification and object recognition, probably motivated by the evidence of sparse representations internal in the primate visual cortex. It is still unclear, however, whethe ...
This chapter introduces a discriminative method for detecting and spotting keywords in spoken utterances. Given a word represented as a sequence of phonemes and a spoken utterance, the keyword spotter predicts the best time span of the phoneme sequence in ...
This thesis proposes a Chamfer-based method for human body pose detection that combines silhouette matching, motion information, and statistical relevance estimates in an original way. We demonstrate that our method can not only detect people but also reco ...
In this paper we investigate the detection and recognition of sequences of numbers in spoken utterances. This is done in two steps: first, the entire utterance is decoded assuming that only numbers were spoken. In the second step, non-number segments (garb ...
Developing new techniques for human-computer interaction is very challenging. Vision-based techniques have the advantage of being unobtrusive and hands are a natural device that can be used for more intuitive interfaces. But in order to use hands for inter ...
Phone posteriors has recently quite often used (as additional features or as local scores) to improve state-of-the-art automatic speech recognition (ASR) systems. Usually, better phone posterior estimates yield better ASR performance. In the present paper ...