Speech/Music Discrimination using Entropy and Dynamism Features in a HMM Classification Framework
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In this paper we present a new approach towards high performance speech/music segmentation on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, the local probability density function (PDF) estimators ...
State-of-the-art automatic speech recognition (ASR) techniques are typically based on hidden Markov models (HMMs) for the modeling of temporal sequences of feature vectors extracted from the speech signal. At the level of each HMM state, Gaussian mixture m ...
In this paper, we present an HMM2 based method for speaker normalization. Introduced as an extension of Hidden Markov Model (HMM), HMM2 differentiates itself from the regular HMM in terms of the emission density modeling, which is done by a set of state-de ...
In this paper, we present an HMM2 based method for speaker normalization. Introduced as an extension of Hidden Markov Model (HMM), HMM2 differentiates itself from the regular HMM in terms of the emission density modeling, which is done by a set of state-de ...
One of the difficulties in Automatic Speech Recognizer (ASR) is the pronunciation variability. Each word (modeled by a baseline phonetic transcription in the ASR dictionary) can be pronounced in many different ways depending on many complex qualitative and ...
In this report, we present preliminary experiments towards automatic inference and evaluation of pronunciation models based on multiple utterances of each lexicon word and their given baseline pronunciation model (baseform phonetic transcription). In the p ...
This paper describes a complete system for audio-visual recognition of continuous speech including robust lip tracking, visual feature extraction, noise-robust acoustic feature extraction, and sensor integration. An appearance based model of the articulato ...
In this paper, we present a new approach towards high performance speech/music discrimination on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, the (local) Probability Density Function (PDF) estima ...
In this paper we present a new approach towards high performance speech/music segmentation on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, the local probability density function (PDF) estimators ...
In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF), entropy can also be used to measure the ``peakiness'' of a distribution. In this paper, we propose using t ...