COMPARISON OF SUBWORD SEGMENTATION METHODS FOR OPEN-VOCABULARYEND-TO-END SPEECH RECOGNITION

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

To address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR), we experiment with subword segmentation approaches, specifically byte-pair encoding and unigram language model. Such approaches are attractive in general for morphologically rich languages, and in particular for German. We propose a technique which computes the tokenization rate of an utterance transcription in the spirit of the out-of-vocabulary (OOV) metric that would be used for closed vocabularies. We show that this tokenization rate can then be used to rank evaluation utterances in terms of recognition difficulty. Using this technique we show that the optimal choice of subword segmentation technique depends on the expected tokenization rate of the domain. We further show that a hybrid solution exists and can lead to improved performance. For the ASR model, we employ wav2letter, a fully convolutional sequence-to-sequence encoder architecture using time-depth separable convolution blocks and a lexicon-free beam search decoding with n-gram subword language model.

COMPARISON OF SUBWORD SEGMENTATION METHODS FOR OPEN-VOCABULARYEND-TO-END SPEECH RECOGNITION

Graph Chatbot

Chattez avec Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Aggregating Spatial and Photometric Context for Photometric Stereo

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Aggregating Spatial and Photometric Context for Photometric Stereo