Publication

Efficient Transformer-Based Speech Recognition

Related publications (125)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Musical Source Separation

Musical source separation is a complex topic that has been extensively explored in the signal processing community and has benefited greatly from recent machine learning research. Many deep learning models with impressive source separation quality have bee ...

2020

Fast Transformers with Clustered Attention

François Fleuret, Angelos Katharopoulos, Apoorv Vyas

Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitivel ...

2020

pyannote.audio: neural building blocks for speaker diarization

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build ...

2020

Memory Augmented Neural Model for Incremental Session-based Recommendation

Boi Faltings, Fei Mi

Increasing concerns with privacy have stimulated interests in Session-based Recommendation (SR) using no personal data other than what is observed in the current browser session. Existing methods are evaluated in static settings which rarely occur in real- ...

2020

Comparison of Subword Segmentation Methods for Open-vocabulary ASR using a Difficulty Metric

Philip Neil Garner, Claudiu-Cristian Musat

We experiment with subword segmentation approaches that are widely used to address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR). For morphologically rich languages such as German which has many rare words main ...

2020

A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

Mathieu Salzmann, Fatemehsadat Saleh

Human motion prediction, the task of predicting future 3D human poses given a sequence of observed ones, has been mostly treated as a deterministic problem. However, human motion is a stochastic process: Given an observed sequence of poses, multiple future ...

IEEE2020

On the Relationship between Self-Attention and Convolutional Layers

Martin Jaggi, Andreas Loukas, Jean-Baptiste Francis Marie Juliette Cordonnier

Recent trends of incorporating attention mechanisms in vision have led re- searchers to reconsider the supremacy of convolutional layers as a primary build- ing block. Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed ...

2020

Language model domain adaptation for automatic speech recognition

Petr Motlicek, Amrutha Prasad

This report provides an overview of the work carried out in improving Language Model (LM) development used during the decoding of an Automatic Speech Recognition (ASR) system. The goal of this work is to develop a robust language model that can be adapted ...

Idiap2020

Self-attention for Speech Emotion Recognition

Philip Neil Garner, Lorenzo Tarantino

Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep learning, including recurrent based and attention based neural network architectures as well. Nevertheless, performance still falls short of that of humans. ...

2019

Language Independent Query by Example Spoken Term Detection

Dhananjay Ram

Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...

EPFL2019