Publication

Exploiting Low-dimensional Structures to Enhance DNN based Acoustic Modeling in Speech Recognition

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Publications related to Exploiting Low-dimensional Structures to Enhance DNN based Acoustic Modeling in Speech Recognition | EPFL Graph Search

Leveraging Unlabeled Data to Track Memorization

Patrick Thiran, Mahsa Forouzesh, Hanie Sedghi

Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, ca ...

2023

Sparse Autoencoders for Speech Modeling and Recognition

Selen Hande Kabil

Speech recognition-based applications upon the advancements in artificial intelligence play an essential role to transform most aspects of modern life. However, speech recognition in real-life conditions (e.g., in the presence of overlapping speech, varyin ...

EPFL2023

Autoencoders reloaded

Hervé Bourlard, Selen Hande Kabil

In Bourlard and Kamp (Biol Cybern 59(4):291-294, 1998), it was theoretically proven that autoencoders (AE) with single hidden layer (previously called "auto-associative multilayer perceptrons") were, in the best case, implementing singular value decomposit ...

SPRINGER2022

Controllability and Interpretability in Affective Speech Synthesis

Bastian Schnell

Thanks to Deep Learning Text-To-Speech (TTS) has achieved high audio quality with large databases. But at the same time the complex models lost any ability to control or interpret the generation process. For the big challenge of affective TTS it is infeasi ...

EPFL2022

A Structured Dictionary Perspective on Implicit Neural Representations

Pascal Frossard, Guillermo Ortiz Jimenez, Gizem Yüce, Beril Besbinar

Implicit neural representations (INRs) have recently emerged as a promising alternative to classical discretized representations of signals. Nevertheless, despite their practical success, we still do not understand how INRs represent signals. We propose a ...

IEEE COMPUTER SOC2022

Serab: A Multi-Lingual Benchmark For Speech Emotion Recognition

Milos Cernak, Pierre Anton Beckmann

Recent developments in speech emotion recognition (SER) often leverage deep neural networks (DNNs). Comparing and benchmarking different DNN models can often be tedious due to the use of different datasets and evaluation protocols. To facilitate the proces ...

IEEE2022

Biologically plausible unsupervised learning in shallow and deep neural networks

Bernd Albert Illing

The way our brain learns to disentangle complex signals into unambiguous concepts is fascinating but remains largely unknown. There is evidence, however, that hierarchical neural representations play a key role in the cortex. This thesis investigates biolo ...

EPFL2021

Fair Voice Biometrics: Impact of Demographic Imbalance on Group Fairness in Speaker Recognition

Mirko Marras

Speaker recognition systems are playing a key role in modern online applications. Though the susceptibility of these systems to discrimination according to group fairness metrics has been recently studied, their assessment has been mainly focused on the di ...

ISCA-INT SPEECH COMMUNICATION ASSOC2021

Multilingual Training and Adaptation in Speech Recognition

Sibo Tong

State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ...

EPFL2020

On quantifying the quality of acoustic models in hybrid DNN-HMM ASR

Hervé Bourlard, Afsaneh Asaei, Pranay Dighe

We propose an information theoretic framework for quantitative assessment of acoustic models used in hidden Markov model (HMM) based automatic speech recognition (ASR). The HMM backend expects that (i) the acoustic model yields accurate state conditional e ...

ELSEVIER2020