Improving speech embedding using crossmodal transfer learning with audio-visual data
Related publications (41)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a mapping within the autoencoder's embedding sp ...
Through profiling and matching processes, technology provides individuals with information that becomes redundant to their previous beliefs, attitudes and preferences. The emergence of informational redundancies encouraged by some technologies is likely to ...
Modeling and predicting student learning is an important task in computer-based education. A large body of work has focused on representing and predicting student knowledge accurately. Existing techniques are mostly based on students' performance and on ti ...
This paper proposes a novel approach to improve speaker modeling using knowledge transferred from face representation. In particular, we are interested in learning a discriminative metric which allows speaker turns to be compared directly, which is benefic ...
Short-term field study involves groups of students working in an off-campus (sometimes international) setting, and often involves working on realistic, open-ended problems, in interaction with a host community. Such learning experiences are intended to dev ...
Collaborative learning flow patterns (CLFPs) encode solutions to recurrent pedagogical problems, which have been successfully applied to the design of learning experiences. However, the pedagogical knowledge encoded in these patterns has seldom been exploi ...
Neuromorphic systems provide brain-inspired methods of computing. In a neuromorphic architecture, inputs are processed by a network of neurons receiving operands through synaptic interconnections, tuned in the process of learning. Neurons act simultaneousl ...
Empirical studies document a positive effect of collaboration on team productivity. However, little has been done to assess how knowledge flows among team members. Our study addresses this issue by exploring unique rich data on a Swiss funding program prom ...
Learning speaker turn embeddings has shown considerable improvement in situations where conventional speaker modeling approaches fail. However, this improvement is relatively limited when compared to the gain observed in face embedding learning, which has ...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these data and to enhance the user experience, there is a need to make these videos searchable through automatic indexing. Because people appearing and talking in ...