Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

This paper presents a raw-waveform neural network and uses it along with a denoising network for clustering in weakly supervised learning scenarios under extreme noise conditions. Specifically, we consider language independent Automatic Gender Recognition (AGR) on a set of varied noise conditions and Signal to Noise Ratios (SNRs). We formulate the denoising problem as a source separation task and train the system using a discriminative criterion in order to enhance output SNRs. A denoising Recurrent Neural Network (RNN) is first trained on a small subset (roughly one-fifth) of the data for learning a speech specific mask. The denoised speech signal is then directly fed as input to a raw-waveform convolutional neural network (CNN) trained with denoised speech. We evaluate the standalone performance of denoiser in terms of various signal-to-noise measures and discuss its contribution towards robust AGR. An absolute improvement of 11.06% and 13.33% is achieved by the combined pipeline over the i-vector SVM baseline system for 0 dB and -5 dB SNR conditions, respectively. We further analyse the information captured by the first CNN layer in both noisy and denoised speech.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

Graph Chatbot

Chattez avec Graph Search

A Multimodal Dataset for Automatic Edge-AI Cough Detection

How Does Pre-Trained Wav2Vec 2.0 Perform On Domain-Shifted Asr? An Extensive Benchmark On Air Traffic Control Communications

An Efficient Signal-to-noise Approximation for Eccentric Inspiraling Binaries

An Efficient Signal-to-noise Approximation for Eccentric Inspiraling Binaries

A Multimodal Dataset for Automatic Edge-AI Cough Detection

How Does Pre-Trained Wav2Vec 2.0 Perform On Domain-Shifted Asr? An Extensive Benchmark On Air Traffic Control Communications