Publications related to Mel-frequency cepstrum

Trustworthy speaker recognition with minimal prior knowledge using neural networks

The performance of speaker recognition systems has considerably improved in the last decade. This is mainly due to the development of Gaussian mixture model-based systems and in particular to the use of i-vectors. These systems handle relatively well noise ...

EPFL2019

Towards directly modeling raw speech signal for speaker verification using CNNs

Sébastien Marcel, Hannah Muckenhirn

Speaker verification systems traditionally extract and model cepstral features or filter bank energies from the speech signal. In this paper, inspired by the success of neural network-based approaches to model directly raw speech signal for applications su ...

IEEE2018

Modified group delay feature based total variability space modelling for speaker recognition

In this paper, modified group delay (MODGD) features are used to model target speakers in the Total Variability Space (TVS) framework for speaker recognition. MODGD based features have been shown to improve speaker recognition performance owing to the abil ...

2015

Robust Log-Energy Estimation and its Dynamic Change Enhancement for In-car Speech Recognition

Hervé Bourlard, Weifeng Li

The log-energy parameter, typically derived from a full-band spectrum, is a critical feature commonly used in automatic speech recognition (ASR) systems. However, log-energy is difficult to estimate reliably in the presence of background noise. In this pap ...

Ieee-Inst Electrical Electronics Engineers Inc2013

Wordless Sounds: Robust Speaker Diarization using Privacy-Preserving Audio Representations

Hervé Bourlard, Daniel Gatica-Perez, Sree Hari Krishnan Parthasarathi

This paper investigates robust privacy-sensitive audio features for speaker diarization in multiparty conversations: ie., a set of audio features having low linguistic information for speaker diarization in a single and multiple distant microphone scenario ...

2013

Unified Framework of Feature Based Adaptation for Statistical Speech Synthesis and Recognition

Lakshmi Babu Saheer

The advent of statistical parametric speech synthesis has paved new ways to a unified framework for hidden Markov model (HMM) based text to speech synthesis (TTS) and automatic speech recognition (ASR). The techniques and advancements made in the field of ...

EPFL2013

Phase AutoCorrelation (PAC) features for noise robust speech recognition

Mathew Magimai Doss, Hynek Hermansky, Hemant Misra, Shajith Ikbal

In this paper, we introduce a new class of noise robust features derived from an alternative measure of autocorrelation representing the phase variation of speech signal frame over time. These features, referred to as Phase AutoCorrelation (PAC) features i ...

2012

A Fast Parts-based Approach to Speaker Verification using Boosted Slice Classifiers

Sébastien Marcel, Mathew Magimai Doss, Anindya Roy

Speaker verification on portable devices like smartphones is gradually becoming popular. In this context, two issues need to be considered: 1) such devices have relatively limited computation resources, and 2) they are liable to be used everywhere, possibl ...

2012

Multi-parametric source-filter separation of speech and prosodic voice restoration

Olaf Schleusing

In this thesis, methods and models are developed and presented aiming at the estimation, restoration and transformation of the characteristics of human speech. During a first period of the thesis, a concept was developed that allows restoring prosodic voic ...

EPFL2012

Boosting Localized Features for Speaker and Speech Recognition

Anindya Roy

In this thesis, we propose a novel approach for speaker and speech recognition involving localized, binary, data-driven features. The proposed approach is largely inspired by similar localized approaches in the computer vision domain. The success of these ...

EPFL2011

Privacy-Sensitive Audio Features for Conversational Speech Processing

Sree Hari Krishnan Parthasarathi

The work described in this thesis takes place in the context of capturing real-life audio for the analysis of spontaneous social interactions. Towards this goal, we wish to capture conversational and ambient sounds using portable audio recorders. Analysis ...

EPFL2011

Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition

Philip Neil Garner

Cepstral normalisation in automatic speech recognition is investigated in the context of robustness to additive noise. In this paper, it is argued that such normalisation leads naturally to a speech feature based on signal to noise ratio rather than absolu ...

2011