COMPARISON OF SUBWORD SEGMENTATION METHODS FOR OPEN-VOCABULARYEND-TO-END SPEECH RECOGNITION
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
This paper addresses the problem of automatic facial expression recognition in videos, where the goal is to predict discrete emotion labels best describing the emotions expressed in short video clips. Building on a pre-trained convolutional neural network ...
Proprioceptive signals are a critical component of our ability to perform complex movements, identify our posture and adapt to environmental changes. Our movements are generated by a large number of muscles and are sensed via a myriad of different receptor ...
In this paper, we propose a novel temporal spiking recurrent neural network (TSRNN) to perform robust action recognition in videos. The proposed TSRNN employs a novel spiking architecture which utilizes the local discriminative features from high-confidenc ...
Robustness of extracted embeddings in cross-database scenarios is one of the main challenges in text-independent speaker verification (SV) systems. In this paper, we investigate this robustness via performing structural cross-database experiments with or w ...
Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and o ...
Learning a good speaker embedding is critical for many speech processing tasks, including recognition, verification, and diarization. To this end, we propose a complementary optimizing goal called intra-class loss to improve deep speaker embed dings learne ...
This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that ...
Second-order pooling, a.k.a. bilinear pooling, has proven effective for deep learning based visual recognition. However, the resulting second-order networks yield a final representation that is orders of magnitude larger than that of standard, first-order ...
Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and o ...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep learning, including recurrent based and attention based neural network architectures as well. Nevertheless, performance still falls short of that of humans. ...