We present a proposal of a kernel-based model for large vocabulary continuous speech recognizer. The continuous speech recognition is described as a problem of finding the best phoneme sequence and its best time span, where the phonemes are generated from all permissible word sequences. A non-probabilistic score is assigned to every phoneme sequence and time span sequence, according to a kernel-based acoustic model and a kernel-based language model. The acoustic model is described in terms of segments, where each segment corresponds to a whole phoneme, and it generalizes Segmental Models for the non-probabilistic setup. The language model is based on discriminative language model recently proposed by Roark et al. (2007). We devise a loss function based on the word error rate and present a large margin training procedure for the kernel models, which aims at minimizing this loss function. Finally, we discuss the practical issues of the implementation of kernel-based continuous speech recognition model by presenting an efficient iterative algorithm and considering the decoding process. We conclude the chapter by a brief discussion on the model limitations and future work. This chapter does not introduce any experimental results.
Florent Gérard Krzakala, Lenka Zdeborová, Hugo Chao Cui
Wulfram Gerstner, Johanni Michael Brea, Georgios Iatropoulos