pyannote.audio: neural building blocks for speaker diarization
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Language independent query-by-example spoken term detection (QbE-STD) is the problem of retrieving audio documents from an archive, which contain a spoken query provided by a user. This is usually casted as a hypothesis testing and pattern matching problem ...
Our brain continuously self-organizes to construct and maintain an internal representation of the world based on the information arriving through sensory stimuli. Remarkably, cortical areas related to different sensory modalities appear to share the same f ...
In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of v ...
In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN). DDLCN has most of the standard deep learning layers (pooling, fully, connected, input/output, etc.) but the main difference is that the fundamental convolutional l ...
For a long time, natural language processing (NLP) has relied on generative models with task specific and manually engineered features. Recently, there has been a resurgence of interest for neural networks in the machine learning community, obtaining state ...
The goal of the scene labeling task is to assign a class label to each pixel in an image. To ensure a good visual coherence and a high class accu- racy, it is essential for a model to capture long range (pixel) label dependencies in images. In a feed-forwa ...
Automatic transcription of handwritten texts has made important progress in the recent years. This increase in performance, essentially due to new architectures combining convolutional neural networks with recurrent neutral networks, opens new avenues for ...
Automatic transcription of handwritten texts has made important progress in the recent years. This increase in performance, essentially due to new architectures combining convolutional neural networks with recurrent neutral networks, opens new avenues for ...
Human motion prediction, i.e., forecasting future body poses given observed pose sequence, has typically been tackled with recurrent neural networks (RNNs). However, as evidenced by prior work, the resulted RNN models suffer from prediction errors accumula ...
We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, ...