Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
State of the art solutions to query by example spoken term detection (QbE-STD) rely on bottleneck feature representation of the query and audio document. Here, we present a study on QbE-STD performance using several monolingual as well as multilingual bottleneck features extracted from feed forward networks. In contrast to previous works, we use multitask learning to train the multilingual networks which perform significantly better than the concatenated monolingual features. Additionally, we propose to employ residual networks (ResNet) to estimate the bottleneck features and show significant improvements over the corresponding feed forward network based features. The neural networks are trained on GlobalPhone corpus and QbE-STD experiments are performed on a very challenging QUESST 2014 database.
Alexander Mathis, Alberto Silvio Chiappa, Alessandro Marin Vargas, Axel Bisi
Wulfram Gerstner, Stanislaw Andrzej Wozniak, Ana Stanojevic, Giovanni Cherubini, Angeliki Pantazi