Publication

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Hervé Bourlard, Apoorv Vyas
2021
Conference paper

Abstract

In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on three different datasets including outof-domain (Switchboard) and cross-lingual (Babel) scenarios. Our results show that for supervised adaptation of the wav2vec 2.0 model, both E2E-LFMMI and CTC achieve similar results; significantly outperforming the baselines trained only with supervised data. Fine-tuning the wav2vec 2.0 model with E2ELFMMI and CTC we obtain the following relative WER improvements over the supervised baseline trained with E2ELFMMI. We get relative improvements of 40% and 44% on the clean-set and 64% and 58% on the test set of Librispeech (100h) respectively. On Switchboard (300h) we obtain relative improvements of 33% and 35% respectively. Finally, for Babel languages, we obtain relative improvements of 26% and 23% on Swahili (38h) and 18% and 17% on Tagalog (84h) respectively.

Official source

https://infoscience.epfl.ch/record/296807?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Graph Chatbot

Chat with Graph Search

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Deep Learning Generalization with Limited and Noisy Labels

Leveraging Unlabeled Data to Track Memorization

Deep Learning Generalization with Limited and Noisy Labels

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Leveraging Unlabeled Data to Track Memorization