Publication

Robust audio segmentation

Publications associées (173)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Gradient-based spectral visualization of CNNs using raw waveforms

Sébastien Marcel, Hannah Muckenhirn

Modeling directly raw waveform through neural networks for speech processing is gaining more and more attention. Despite its varied success, a question that remains is: what kind of information are such neural networks capturing or learning for different t ...

Idiap2018

Visual speech recognition

Marina Zimmermann

Speech is the most natural means of communication for humans. Therefore, since the beginning of computers it has been a goal to interact with machines via speech. While there have been gradual improvements in this field over the decades, and with recent dr ...

EPFL2018

Inpainting of Long Audio Segments With Similarity Graphs

Nathanaël Perraudin

We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A sui ...

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2018

On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs

Sébastien Marcel, Hannah Muckenhirn

In a recent work, we have shown that speaker verification systems can be built where both features and classifiers are directly learned from the raw speech signal with convolutional neural networks (CNNs). In this framework, the training phase also decides ...

ISCA-INT SPEECH COMMUNICATION ASSOC2018

Decentralized clustering for node-variant graph filtering with graph diffusion LMS

Ali H. Sayed, Roula Nassif

In this work, we consider the problem of estimating the coefficients of linear shift-invariant FIR graph filters. We assume hybrid node-varying graph filters where the network is decomposed into clusters of nodes and, within each cluster, all nodes have th ...

IEEE2018

Combining the SNR Spectrum with a Cochlear Model

Philip Neil Garner

The SNR spectrum was previously introduced as a natural consequence of using cepstral normalisa- tion in speech recognition; it is closely related to the articulation index of Fletcher. Motivated initially by a theoretical difficulty in frequency warping, ...

Idiap2018

Template-matching for text-dependent speaker verification

Petr Motlicek, Subhadeep Dey

In the last decade, i-vector and Joint Factor Analysis (JFA) approaches to speaker modeling have become ubiquitous in the area of automatic speaker recognition. Both of these techniques involve the computation of posterior probabilities, using either Gauss ...

2017

Intonation Modelling for Speech Synthesis and Emphasis Preservation

Pierre-Edouard Jean Charles Honnet

Speech-to-speech translation is a framework which recognises speech in an input language, translates it to a target language and synthesises speech in this target language. In such a system, variations in the speech signal which are inherent to natural hum ...

EPFL2017

Perceptual Information Loss due to Impaired Speech Production

Hervé Bourlard, Milos Cernak, Afsaneh Asaei

Phonological classes define articulatory-free and articulatory-bound phone attributes. Deep neural network is used to estimate the probability of phonological classes from the speech signal. In theory, a unique combination of phone attributes form a phonem ...

2017

Sparse Pronunciation Codes for Perceptual Phonetic Information Assessment

Hervé Bourlard, Milos Cernak, Afsaneh Asaei, Dhananjay Ram

Speech is a complex signal produced by a highly constrained articulation machinery. Neuro and psycholinguistic theories assert that speech can be decomposed into molecules of structured atoms. Although characterization of the atoms is controversial, the ex ...

2017