Publication

Exploiting contextual information for speech/non-speech detection

Hynek Hermansky, Sree Hari Krishnan Parthasarathi
2008
Rapport ou document de travail
Résumé

In this paper, we investigate the effect of temporal context for speech/non-speech detection (SND). It is shown that even a simple feature such as full-band energy, when employed with a large-enough context, shows promise for further investigation. Experimental evaluations on the test data set, with a state-of-the-art multi-layer perceptron based SND system and a simple energy threshold based SND method, using the F-measure, show an absolute performance gain of 4.4%4.4\% and 5.4%5.4\% respectively. The optimal contextual length was found to be 1000 ms. Further numerical optimizations yield an improvement (3.37%3.37\% absolute), resulting in an absolute gain of 7.77%7.77\% and 8.77%8.77\% over the MLP based and energy based methods respectively. ROC based performance evaluation also reveals promising performance for the proposed method, particularly in low SNR conditions.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.