Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation
Related publications (36)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The speech signal conveys information on different time scales from short (20--40 ms) time scale or segmental, associated to phonological and phonetic information to long (150--250 ms) time scale or supra segmental, associated to syllabic and prosodic info ...
The speech signal conveys information on different time scales from short (20–40 ms) time scale or segmental, associated to phonological and phonetic information to long (150–250 ms) time scale or supra segmental, associated to syllabic and prosodic inform ...
We describe here a collection of speech data of bilingual and trilingual speakers of English, French, German and Italian. In the context of speech to speech translation (S2ST), this database is designed for several purposes and studies: training CLSA syste ...
The speech signal conveys information on different time scales from short (20--40 ms) time scale or segmental, associated to phonological and phonetic information to long (150--250 ms) time scale or supra segmental, associated to syllabic and prosodic info ...
This thesis deals with signal-based methods that predict how listeners perceive speech quality in telecommunications. Such tools, called objective quality measures, are of great interest in the telecommunications industry to evaluate how new or deployed sy ...
Since the prosody of a spoken utterance carries information about its discourse function, salience, and speaker attitude, prosody mod- els and prosody generation modules have played a crucial part in text-to- speech (TTS) synthesis systems from the beginni ...
Decoding speech from intracranial recordings serves two main purposes: understanding the neural correlates of speech processing and decoding speech features for targeting speech neuroprosthetic devices. Intracranial recordings have high spatial and tempora ...
Progressive apraxia of Speech (PAoS) is a progressive motor speech disorder associated with neurodegenerative disease causing impairment of phonetic encoding and motor speech planning. Clinical observation and acoustic studies show that duration analysis p ...
One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units or “lexical units” and acoustic feature observations. To model this relationship two types of resources ...
Automatic non-native accent assessment has many potential benefits in language learning and speech technologies. The three fundamental challenges in automatic accent assessment are to characterize, model and assess individual variation in speech of the non ...