Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.
Multi-stream approaches have proven to be very successful in speech recognition tasks and to a certain extent in speaker authentication tasks. In this study we propose a noise-robust multi-stream text-independent speaker authentication system. This system has two steps: first train the stream experts under clean conditions and then train the combination mechanism to merge the scores of the stream experts under both clean and noisy conditions. The idea here is to take advantage of the rather predictable reliability and diversity of streams under different conditions. Hence, noise-robustness is mainly due to the combination mechanism. This two-step approach offers several practical advantages: the stream experts can be trained in parallel (e.g., by using several machines); heterogeneous types of features can be used and the resultant system can be robust to different noise types (wide bands or narrow bands) as compared to sub-streams. An important finding is that a trade-off is often necessary between the overall good performance under all conditions (clean and noisy) and good performance under clean conditions. To reconcile this trade-off, we propose to give more emphasis or prior to clean conditions, thus, resulting in a combination mechanism that does not deteriorate under clean conditions (as compared to the best stream) yet is robust to noisy conditions.
Chargement
Chargement
Chargement
Chargement
Chargement
Samy Bengio, Norman Hoon Thian Poh
Samy Bengio, Hervé Bourlard, Katrin Weber
multi-style'' training. More specifically, we will motivate and briefly describe new approaches based on multi-stream and subband ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) streams representing the speech signal are processed by different (independent)
experts'', each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation.