Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Performing speaker diarization while uniquely identifying the speakers in a collection of audio recordings is a challenging task. Based on our previous work on speaker diarization and linking, we developed a system for diarizing longitudinal TV show data s ...
This paper presents a new approach toward automatic annotation of meetings in terms of speaker identities and their locations. This is achieved by segmenting the audio recordings using two independent sources of information : magnitude spectrum analysis an ...
Inference from data is of key importance in many applications of informatics. The current trend in performing such a task of inference from data is to utilise machine learning algorithms. Moreover, in many applications that it is either required or is pref ...
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for organizing trained and untrained neural networks. In one aspect, a neural network device includes a collection of node assemblies interconnected by betwe ...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identifica ...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identifica ...
In the last couple of years more and more multimodal corpora have been created. Recently many of these corpora have also included RGB-D sensors' data. However, there is to our knowledge no publicly available corpus, which combines accurate gaze-tracking, a ...
The IDIAP Smart Meeting Room is a meeting room equipped with synchronised, multi-channel audio-visual recording facilities. This document presents a detailed description of the room with particular emphasis on the acquisition equipment and the components u ...
This communication describes the multi-modal VidTIMIT database, which can be useful for research involving mono- or multi-modal speech recognition or person authentication. It is comprised of video and corresponding audio recordings of 43 volunteers, recit ...
This paper presents a new approach toward automatic annotation of meetings in terms of speaker identities and their locations. This is achieved by segmenting the audio recordings using two independent sources of information : magnitude spectrum analysis an ...