Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowl-edge. This paper proposes a cross-lingual automatic speechrecognition (ASR) based on AF methods. Various neural network(NN) architectures are explored to extract cross-lingual AFs andtheir performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup,only the source language (English, representing a well-resourcedlanguage) is used to train the AF extractors. AFs are thengenerated for the target language (Mandarin, representing anunder-resourced language) using the trained extractors. Theframe-classification accuracy indicates that the LSTM has anability to perform a knowledge transfer through the robust cross-lingual AFs from well-resourced to under-resourced language.The final ASR system is built using traditional approaches(e.g. hybrid models), combining AFs with conventional MFCCs.The results demonstrate that the cross-lingual AFs improvethe performance in under-resourced ASR task even though thesource and target languages come from different language family.Overall, the proposed cross-lingual ASR approach provides slightimprovement over the monolingual LF-MMI and cross-lingual(acoustic model adaptation-based) ASR systems.
Alexander Mathis, Alberto Silvio Chiappa, Alessandro Marin Vargas, Axel Bisi