Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In the context of hybrid HMM/MLP Automatic Speech Recognition (ASR), this paper describes an investigation into a new type of stochastic phone space transformation, which maps "source" phone (or phone HMM state) posterior probabilities (as obtained at the ...
Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronunciation lexicon. However, under-resourced languages typically lack su ...
In this paper, we investigate pitch contour modelling in speech synthesis based on segmental units. A convolutional pitch target approximation model is proposed. This model allows jointly stochastic modelling of framewise pitch and pitch contour of longer ...
This report presents one month trainee work on development of French Automatic Speech Recognition ASR system using a french part of multilingual database GlobalPhone_FR. The purpose of this report is to explain and give results of the training and testing ...
The thesis work was motivated by the goal of developing personalized speech-to-speech translation and focused on one of its key component techniques – cross-lingual speaker adaptation for text-to-speech synthesis. A personalized speech-to-speech translator ...
This paper investigates robust privacy-sensitive audio features for speaker diarization in multiparty conversations: ie., a set of audio features having low linguistic information for speaker diarization in a single and multiple distant microphone scenario ...
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex ...
A common assumption in activity recognition is that the system remains unchanged between its design and its posterior operation. However, many factors affect the data distribution between two different experimental sessions. One of these factors is the pot ...
One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multiaccented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by usi ...
A Language Model (LM) is a helpful component of a variety of Natural Language Processing (NLP) systems today. For speech recognition, machine translation, information retrieval, word sense disambiguation etc., the contribution of an LM is to provide featur ...