Publication

Applying Multi- and Cross-Lingual Stochastic Phone Space Transformations to Non-Native Speech Recognition

Related publications (42)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Neural Network (NN) classifiers can assign extreme probabilities to samples that have not appeared during training (out-of-distribution samples) resulting in erroneous and unreliable predictions. One of the causes for this unwanted behaviour lies in the us ...

IEEE2020

Deep neural networks have been empirically successful in a variety of tasks, however their theoretical understanding is still poor. In particular, modern deep neural networks have many more parameters than training data. Thus, in principle they should over ...

EPFL2020

Language Independent Query by Example Spoken Term Detection

Dhananjay Ram

Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...

EPFL2019

, ,

Training deep neural networks with the error backpropagation algorithm is considered implausible from a biological perspective. Numerous recent publications suggest elaborate models for biologically plausible variants of deep learning, typically defining s ...

2019

Template-matching for text-dependent speaker verification

Petr Motlicek, Subhadeep Dey

In the last decade, i-vector and Joint Factor Analysis (JFA) approaches to speaker modeling have become ubiquitous in the area of automatic speaker recognition. Both of these techniques involve the computation of posterior probabilities, using either Gauss ...

2017

Automatic speech recognition (ASR) is a fascinating area of research towards realizing humanmachine interactions. After more than 30 years of exploitation of Gaussian Mixture Models (GMMs), state-of-the-art systems currently rely on Deep Neural Network (DN ...

Idiap2016

Efficient Posterior Exemplar Search Space Hashing Exploiting Class-Specific Sparsity Structures

Hervé Bourlard, Milos Cernak, Afsaneh Asaei

This paper shows that exemplar-based speech processing using class-conditional posterior probabilities admits a highly effective search strategy relying on posteriors' intrinsic sparsity structures. The posterior probabilities are estimated for phonetic an ...

2016

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures

Hervé Bourlard, Milos Cernak, Afsaneh Asaei

Idiap2016

Standard automatic speech recognition (ASR) systems follow a divide and conquer approach to convert speech into text. Alternately, the end goal is achieved by a combination of sub-tasks, namely, feature extraction, acoustic modeling and sequence decoding, ...

EPFL2016

Sound Pattern Matching for Automatic Prosodic Event Detection

Hervé Bourlard, Philip Neil Garner, Milos Cernak, Afsaneh Asaei, Pierre-Edouard Jean Charles Honnet

Prosody in speech is manifested by variations of loudness, exaggeration of pitch, and specific phonetic variations of prosodic segments. For example, in the stressed and unstressed syllables, there are differences in place or manner of articulation, vowels ...

Idiap2016