Publication

Cross-lingual Adaptation of a CTC-based multilingual Acoustic Model

Publications associées (42)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Neural Network Based End-to-End Query by Example Spoken Term Detection

Hervé Bourlard, Dhananjay Ram

This article focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottlene ...

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2020

Comparison of Subword Segmentation Methods for Open-vocabulary ASR using a Difficulty Metric

Philip Neil Garner, Claudiu-Cristian Musat

We experiment with subword segmentation approaches that are widely used to address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR). For morphologically rich languages such as German which has many rare words main ...

2020

Temporal Spiking Recurrent Neural Network for Action Recognition

Wei Wang, Siyuan Hao

In this paper, we propose a novel temporal spiking recurrent neural network (TSRNN) to perform robust action recognition in videos. The proposed TSRNN employs a novel spiking architecture which utilizes the local discriminative features from high-confidenc ...

2019

Language Independent Query by Example Spoken Term Detection

Dhananjay Ram

Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...

EPFL2019

Tampered Speaker Inconsistency Detection with Phonetically Aware Audio-visual Features

Sébastien Marcel

The recent increase in social media based propaganda, i.e., ‘fake news’, calls for automated methods to detect tampered content. In this paper, we focus on detecting tampering in a video with a person speaking to a camera. This form of manipulation is easy ...

2019

Overcoming Multi-model Forgetting

Mathieu Salzmann, Anthony Christopher Davison, Martin Jaggi, Yassine Benyahia, Claudiu-Cristian Musat, Kaicheng Yu

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, ...

JMLR2019

Phonetic aware techniques for Speaker Verification

Subhadeep Dey

The goal of this thesis is to improve current state-of-the-art techniques in speaker verification (SV), typically based on âidentity-vectorsâ (i-vectors) and deep neural network (DNN), by exploiting diverse (phonetic) information extracted using variou ...

EPFL2018

Fast Language Adaptation Using Phonological Information

Hervé Bourlard, Philip Neil Garner, Sibo Tong

Phoneme-based multilingual connectionist temporal classification (CTC) model is easily extensible to a new language by concatenating parameters of the new phonemes to the output layer. In the present paper, we improve cross-lingual adaptation in the contex ...

ISCA-INT SPEECH COMMUNICATION ASSOC2018

Evolution of Neural Network Architectures for Speech Recognition

Hervé Bourlard

Over these last few years, the use of Artificial Neural Networks (ANNs), now often referred to as deep learning or Deep Neural Networks (DNNs), has significantly reshaped research and development in a variety of signal and information processing tasks. Whi ...

ISCA-INT SPEECH COMMUNICATION ASSOC2018

Statistically-Motivated Second-order Pooling

Mathieu Salzmann, Kaicheng Yu

Second-order pooling, a.k.a. bilinear pooling, has proven effective for deep learning based visual recognition. However, the resulting second-order networks yield a final representation that is orders of magnitude larger than that of standard, first-order ...

2018