Publication

Self-attention for Speech Emotion Recognition

Philip Neil Garner, Lorenzo Tarantino
2019
Conference paper

Abstract

Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep learning, including recurrent based and attention based neural network architectures as well. Nevertheless, performance still falls short of that of humans. In this work, we investigate whether SER could benefit from the self-attention and global windowing of the transformer model. We show on the IEMOCAP database that this is indeed the case. Finally, we investigate whether using the distribution of, possibly conflicting, annotations in the training data, as soft targets could outperform a majority voting. We prove that this performance increases with the agreement level of the annotators.

Official source

https://infoscience.epfl.ch/record/275412?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Self-attention for Speech Emotion Recognition

Graph Chatbot

Chat with Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Transformer Models for Vision

Estimating and Improving the Robustness of Attributions in Text

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Estimating and Improving the Robustness of Attributions in Text

Transformer Models for Vision