Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
Short-term spectral features – and most notably Mel-Frequency Cepstral Coefficients (MFCCs) – are the most widely used descriptors of audio signals and are deployed in a majority of state-of-the-art Music Information Retrieval (MIR) systems. These descript ...
Information access within meeting recordings, potentially transcribed and augmented with other media, is facilitated by the use of meeting browsers. To evaluate their performance through a shared benchmark task, users are asked to discriminate between true ...
Speaker diarization is originally defined as the task of de- termining “who spoke when” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state- of-the-art speaker diarizati ...
We revisit the problem of blocking artifacts and their suppression in generic frame-based speech/audio applications. We provide a perceptual characterization of the artifacts by using dynamic auditory models. We propose some short-time-Fourier-transform-ba ...
In this work we consider an ad-hoc audio conferensing system based on VoIP services in which the participants connect to the conference using mobile communication devices with wireless connectivity. To overcome possible quality problems in the wireless lin ...
Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual cues. In this paper we propose the combination of an audio diarisation system with psychology inspired visual features, reporting experiments on multiparty ...
We present a scalable medium bit-rate wide-band audio coding technique based on frequency domain linear prediction (FDLP). FDLP is an efficient method for representing the long-term amplitude modulations of speech/audio signals using autoregressive models. ...
The tetrahedral microphone capsule arrangement in a Soundfield microphone captures a so-called A-format signal which is then converted to a corresponding B-format signal. The phase differences between the A-format signal channels due to non-coincidence of ...
A quantitative measure of relevance is proposed for the task of constructing visual feature sets which are at the same time relevant and compact. A feature's relevance is given by the amount of information that it contains about the problem, while compactn ...
We address the problem of both estimating the dominant person in a meeting from a single audio source and identifying them visually in a multi-camera setting. We use a speaker diarization algorithm to perform speaker segmentation and clustering, representi ...