Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
This paper proposes a joint verification-localization structure based on split-band analysis of speech signal and the mixed voicing level. To address the problems in reverberant acoustic environments, a new fundamental frequency estimation algorithm is pro ...
[...] Primates are social animals whose communication is based on their conspecifics' vocalizations and facial expressions. Although a lot of work to date has studied the unimodal representation of vocal or facial information, little is known about the way ...
The ability to correctly interpret emotional signals from others is crucial for successful social interaction. Previous neuroimaging studies showed that voice-sensitive auditory areas [1-3] activate to a broad spectrum of vocally expressed emotions more th ...
The goal of this thesis is to develop and design new feature representations that can improve the automatic speech recognition (ASR) performance in clean as well noisy conditions. One of the main shortcomings of the fixed scale (typically 20-30 ms long ana ...
This paper reports a study on short-time subharmonic pitch breaks in vocal fold vibration, which are found to be a common feature of the human voice in spoken language. The observed pitch breaks correspond to a change in periodicity of the electrolaryngogr ...
The existence of highly developed and fully automated telephone networks in industrialized countries and the pervasiveness of speech communication technology have resulted in the ever-increasing use of the human voice as an instrument in the commission of ...
The motor actions that can be witnessed as a virtuoso musician performs can be so fast, so accomplished, so precise, as to seem somehow superhuman. The musician has to produce the movements, monitor those they have already made and the subsequent result, c ...
Humans perceive their surrounding environment in a multimodal manner by using multi-sensory inputs combined in a coordinated way. Various studies in psychology and cognitive science indicate the multimodal nature of human speech production and perception. ...
Speech recognition applications embedded on a PDA are already available on the market. The usual hardware for this kind of systems is a single microphone mounted on the PDA, giving good results within quiet environments. Though, the recognition rate falls ...
One of the difficulties in Automatic Speech Recognizer (ASR) is the pronunciation variability. Each word (modeled by a baseline phonetic transcription in the ASR dictionary) can be pronounced in many different ways depending on many complex qualitative and ...