Privacy-Sensitive Audio Features for Conversational Speech Processing
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Human vision has evolved to make sense of a world in which elements almost never appear in isolation. Surprisingly, the recognition of an element in a visual scene is strongly limited by the presence of other nearby elements, a phenomenon known as visual c ...
Speaker recognition systems are playing a key role in modern online applications. Though the susceptibility of these systems to discrimination according to group fairness metrics has been recently studied, their assessment has been mainly focused on the di ...
Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the ...
State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ...
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that gener ...
A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoi ...
To address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR), we experiment with subword segmentation approaches, specifically byte-pair encoding and unigram language model. Such approaches are attractive in genera ...
Deepfake videos, where a person’s face is automatically swapped with a face of someone else, are becoming easier to generate with more realistic results. In response to the threat such manipulations can pose to our trust in video evidence, several large da ...
HMMs have been the one of the first models to be applied for sign recognition and have become the baseline models due to their success in modeling sequential and multivariate data. Despite the extensive use of HMMs for sign recognition, determining the HMM ...
Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...