Deep Learning of Human Perception in Audio Event Classification

In this paper, we introduce our recent studies on human perception in audio event classification. In particular, the pre-trained model VGGish is used as feature extractor to process audio data, and DenseNet is trained by and used as feature extractor for our electroencephalography (EEG) data. The correlation between audio stimuli and EEG is learned in a shared space. In the experiments, we record brain activities (EEG signals) of several subjects while they are listening to music events of 8 audio categories selected from Google AudioSet. Our experimental results demonstrate that i) audio event classification can be improved by exploiting the power of human perception, and ii) the correlation between audio stimuli and EEG can be learned to complement audio event understanding.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Deep Learning of Human Perception in Audio Event Classification

Graph Chatbot

Chat with Graph Search

Resource-Efficient Continual Learning for Personalized Online Seizure Detection

Dataset for '3D printing of customizable transient bioelectronics and sensors'

Hybrid Simulator for Capturing Dynamics of Synthetic Populations

Dataset for '3D printing of customizable transient bioelectronics and sensors'

Resource-Efficient Continual Learning for Personalized Online Seizure Detection

Hybrid Simulator for Capturing Dynamics of Synthetic Populations