Multimodal person recognition in audio-visual streams
Related publications (149)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and o ...
This paper describes the Idiap submission to WAT 2019 for the English-Hindi Multi-Modal Translation Task. We have used the state-of-the-art Transformer model and utilized the IITB English-Hindi parallel corpus as an additional data source. Among the differ ...
Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. ...
The domain of presentation attacks (PA), including vulnerability studies and detection (PAD) remains very much unexplored by available scientific literature in biometric vein recognition. Contrary to other modalities that use visual spectral sensors for ca ...
Stéphane Joost (research and teaching associate at LASIG) shares his experience in data publication. It includes three cases with unexpected facets: - Publishers’ requirements: what to do when publishing data is compulsory, whereas the data provider prohib ...
Synopsis: Implement a new way of interacting with your computer via voice control instead of the mouse and keyboard. Level:BS, MS Description: Google Home and Amazon Alexa are quickly rev ...
In the Internet of Things (IoT), the large volume of data generated by sensors poses significant computational challenges in resource-constrained environments. Most existing machine learning algorithms are unable to train a proper model using a significant ...
While face recognition systems got a significant boost in terms of recognition performance in recent years, they are known to be vulnerable to presentation attacks. Up to date, most of the research in the field of face anti-spoofing or presentation attack ...
In recent years, museums, archives and other cultural institutions have initiated important programs to digitize their collections. Millions of artefacts (paintings, engravings, drawings, ancient photographs) are now represented in digital photographic for ...
In this paper, we introduce our recent studies on human perception in audio event classification. In particular, the pre-trained model VGGish is used as feature extractor to process audio data, and DenseNet is trained by and used as feature extractor for o ...