Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or multi-style'' training. More specifically, we will motivate and briefly describe new approaches based on multi-stream and subband ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) streams representing the speech signal are processed by different (independent)
experts'', each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation.
Petr Motlicek, Hynek Hermansky, Sriram Ganapathy, Amrutha Prasad
Petr Motlicek, Hynek Hermansky, Sriram Ganapathy, Amrutha Prasad