DNN-based Speech Synthesis: Importance of input features and training data
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The respiratory system is an integral part of human speech production. As a consequence, there is a close relation between respiration and speech signal, and the produced speech signal carries breathing pattern related information. Speech can also be gener ...
In communication systems, it is crucial to estimate the perceived quality of audio and speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA, which are intrusive methods. This restricts the possibilities of using these metrics i ...
Deep neural networks (DNNs) are used to reconstruct transmission speckle intensity patterns from the respective reflection speckle intensity patterns generated by illuminated parafilm layers. The dependence of the reconstruction accuracy on the thickness o ...
Recent developments in speech emotion recognition (SER) often leverage deep neural networks (DNNs). Comparing and benchmarking different DNN models can often be tedious due to the use of different datasets and evaluation protocols. To facilitate the proces ...
Thanks to Deep Learning Text-To-Speech (TTS) has achieved high audio quality with large databases. But at the same time the complex models lost any ability to control or interpret the generation process. For the big challenge of affective TTS it is infeasi ...
Deep neural networks have completely revolutionized the field of machinelearning by achieving state-of-the-art results on various tasks ranging fromcomputer vision to protein folding. However, their application is hindered bytheir large computational and m ...
State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ...
In the literature, the task of dysarthric speech intelligibility assessment has been approached through development of different low-level feature representations, subspace modeling, phone confidence estimation or measurement of automatic speech recognitio ...
Speech signal conveys several kinds of information such as a message, speaker identity, emotional state of the speaker and social state of the speaker. Automatic speech assessment is a broad area that refers to using automatic methods to predict human judg ...
Speaker recognition systems are playing a key role in modern online applications. Though the susceptibility of these systems to discrimination according to group fairness metrics has been recently studied, their assessment has been mainly focused on the di ...