Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowl-edge. This paper proposes a cross-lingual automatic speechrecognition (ASR) based on AF methods. Various neural network(NN) architectures are explored to extract cross-lingual AFs andtheir performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup,only the source language (English, representing a well-resourcedlanguage) is used to train the AF extractors. AFs are thengenerated for the target language (Mandarin, representing anunder-resourced language) using the trained extractors. Theframe-classification accuracy indicates that the LSTM has anability to perform a knowledge transfer through the robust cross-lingual AFs from well-resourced to under-resourced language.The final ASR system is built using traditional approaches(e.g. hybrid models), combining AFs with conventional MFCCs.The results demonstrate that the cross-lingual AFs improvethe performance in under-resourced ASR task even though thesource and target languages come from different language family.Overall, the proposed cross-lingual ASR approach provides slightimprovement over the monolingual LF-MMI and cross-lingual(acoustic model adaptation-based) ASR systems.
Alexander Mathis, Alberto Silvio Chiappa, Alessandro Marin Vargas, Axel Bisi