ASAP: a Dataset of Aligned Scores and Performances for Piano Transcription
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identifica ...
When designing an audio processing system, the target tasks often influence the choice of a data representation or transformation. Low-level time-frequency representations such as the short-time Fourier transform (STFT) are popular, because they offer a me ...
The aim of the invention is to provide inter-channel level differences ICLD related to audio signals for hearing aids. This aim is achieved by a method for computing ICLD from a first and second audio source signals, the first source signal being wired wit ...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identifica ...
In this contribution, we present a method to compensate for long duration data gaps in audio signals, in particular music. To achieve this task, a similarity graph is constructed, based on a short-time Fourier analysis of reliable signal segments, e.g. the ...
An encoder/decoder for multi-channel audio data, and in particular for audio reproduction through wave field synthesis. The encoder comprises a two-dimensional filter-bank to the multi-channel signal, in which the channel index is treated as an independent ...
This report presents a semi-supervised method to jointly extract audio-visual sources from a scene. It consist of applying a supervised method to segment the video signal followed by an automatic process to properly separate the audio track. This approach ...
The Swiss Federal Institute of Technology in Lausanne (EPFL) is in the process of digitizing an exceptional collection of audio and video recordings of the Montreux Jazz Festival (MJF) concerts. Since 1967, five thousand hours of both audio and video have ...
The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the rich transcription community have adopted methods and techniques d ...
In the last couple of years more and more multimodal corpora have been created. Recently many of these corpora have also included RGB-D sensors' data. However, there is to our knowledge no publicly available corpus, which combines accurate gaze-tracking, a ...