Publication

Boosting Localized Features for Speaker and Speech Recognition

Anindya Roy
2011
Thèse EPFL
Résumé

In this thesis, we propose a novel approach for speaker and speech recognition involving localized, binary, data-driven features. The proposed approach is largely inspired by similar localized approaches in the computer vision domain. The success of these existing approaches coupled with their proven advantages of robustness and computational efficiency motivated us to apply these ideas to the speech domain. Our approach is distinct from the standard cepstral features-based approach for speaker and speech recognition. The proposed approach starts with a large set of simple localized features, each of which looks at very small parts of spectro-temporal representations of speech. Each feature is binary-valued. The most discriminative of these features are selected by boosting and combined to form the final classifier. Two systems are developed based on this general framework, a speaker recognition system and a speech recognition system. The speaker recognition system is evaluated under a wide range of experimental conditions, using clean speech, noisy speech and speech data collected from mobile phones. The system performs reliably in each condition, comparable with the standard systems using cepstral features and Gaussian Mixture Models. At the same time, it involves significantly lower number of floating point operations compared to these systems. In the case of the speech recognition system, we integrate our localized features with a Hidden Markov Model framework using multilayer perceptrons. Continuous speech recognition studies on standard databases show that these features perform equally well as cepstral features. It is also found that the fusion of these features with cepstral features leads to improved performance at both the feature level and the decision level. Apart from this, minor contributions include an audio-visual person recognition system developed using the same general approach of localized features described above, extending its applicability. Finally, a new (but related) class of localized features was developed for robust face detection.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.