Multimodal feature extraction and fusion for audio-visual speech recognition
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The present thesis is concerned with the development and evaluation (in terms of accuracy and utility) of systems using hand postures and hand gestures for enhanced Human-Computer Interaction (HCI). In our case, these systems are based on vision techniques ...
Mobile service robots are going to play an increasing role in the society of humans. Voice-enabled interaction with service robots becomes very important, if such robots are to be deployed in real-world environments and accepted by the vast majority of pot ...
In this paper, we present a novel feature normalization method in the log-scaled spectral domain for improving the noise robustness of speech recognition front-ends. In the proposed scheme, a non-linear contrast stretching is added to the outputs of log me ...
In this paper, we present a novel feature normalization method in the log-scaled spectral domain for improving the noise robustness of speech recognition front-ends. In the proposed scheme, a non-linear contrast stretching is added to the outputs of log me ...
We address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement throu ...
We present a methodology of reliability estimation in the multimodal biometric verification scenario. Reliability estimation has shown to be an efficient and accurate way of predicting and correcting erroneous classification decisions in both unimodal (spe ...
Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni tion systems. In this paper, we present initial results towa ...
Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni tion systems. In this paper, we present initial results towa ...
This paper proposes an application of information theoretic approach for finding the most informative subset of eigenfeatures to be used for audio-visual speech recognition tasks. The state-of-the-art visual feature extraction methods in the area of speech ...
The present thesis is concerned with the development and evaluation (in terms of accuracy and utility) of systems using hand postures and hand gestures for enhanced Human-Computer Interaction (HCI). In our case, these systems are based on vision techniques ...