Publication

Robust Audio Segmentation

Publications associées (168)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Further Applications of Sector-Based Detection and Short-Term Clustering

Guillaume Lathoud

This paper presents an effective implementation of detection-localization of multiple speech sources with microphone arrays. In particular, the Scaled Conjugate Gradient descent is used for fast and precise localization, within a pre-detected volume of spa ...

IDIAP2006

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

Guillaume Lathoud

Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on di ...

IDIAP2006

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

Guillaume Lathoud

École Polytechnique Fédérale de Lausanne2006

Unsupervised Speech/Non-speech Detection for Automatic Speech Recognition in Meeting Rooms

Daniel Gatica-Perez, Petr Motlicek

The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant ...

IDIAP2006

Audio Coding Based on Long Temporal Contexts

Petr Motlicek, Hynek Hermansky

We describe novel audio coding technique designed to be utilized at medium bit-rates. Unlike classical state-of-the-art audio coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band- ...

IDIAP2006

Infinite Models for Speaker Clustering

Fabio Valente

In this paper we propose the use of infinite models for the clustering of speakers. Speaker segmentation is obtained trough a Dirichlet Process Mixture (DPM) model which can be interpreted as a flexible model with an infinite a priori number of components. ...

2006

Infinite Models for Speaker Clustering

Fabio Valente

IDIAP2006

Combination of Acoustic Classifiers based on Dempster-Shafer Theory of evidence

Hynek Hermansky, Fabio Valente

In this paper we investigate combination of neural net based classifiers using Dempster-Shafer Theory of Evidence. Under some assumptions, combination rule resembles a product of errors rule observed in human speech perception. Different combination are te ...

IDIAP2006

Using more informative posterior probabilities for speech recognition

Hervé Bourlard, Samy Bengio, Hamed Ketabdar

In this paper, we present initial investigations towards boosting posterior probability based speech recognition systems by estimating more informative posteriors taking into account acoustic context (e.g., the whole utterance), as well as possible prior i ...

2006

Automatic genre classification of music content

Giorgio Zoia, Nicolas Scaringella

This paper reviews the state-of-the-art in automatic genre classification of music collections through three main paradigms: expert systems, unsupervised classification, and supervised classification. The paper discusses the importance of music genres with ...

2006