Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
To address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR), we experiment with subword segmentation approaches, specifically byte-pair encoding and unigram language model. Such approaches are attractive in general for morphologically rich languages, and in particular for German. We propose a technique which computes the tokenization rate of an utterance transcription in the spirit of the out-of-vocabulary (OOV) metric that would be used for closed vocabularies. We show that this tokenization rate can then be used to rank evaluation utterances in terms of recognition difficulty. Using this technique we show that the optimal choice of subword segmentation technique depends on the expected tokenization rate of the domain. We further show that a hybrid solution exists and can lead to improved performance. For the ASR model, we employ wav2letter, a fully convolutional sequence-to-sequence encoder architecture using time-depth separable convolution blocks and a lexicon-free beam search decoding with n-gram subword language model.
Volkan Cevher, Grigorios Chrysos, Bohan Wang
, ,