Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Emotion recognition in text has become an important research objective. It involves building classifiers capable of detecting human emotions for a specific application, for example, analyzing reactions to product launches, monitoring emotions at sports events, or discerning opinions in political debates. Most successful approaches rely heavily on costly manual annotation. To alleviate this burden, we propose a distant supervision method-Dystemo-for automatically producing emotion classifiers from tweets labeled using existing or easy-to-produce emotion lexicons. The goal is to obtain emotion classifiers that work more accurately for specific applications than available emotion lexicons. The success of this method depends mainly on a novel classifier-Balanced Weighted Voting (BWV)-designed to overcome the imbalance in emotion distribution in the initial dataset, and on novel heuristics for detecting neutral tweets. We demonstrate how Dystemo works using Twitter data about sports events, a fine-grained 20-category emotion model, and three different initial emotion lexicons. Through a series of carefully designed experiments, we confirm that Dystemo is effective both in extending initial emotion lexicons of small coverage to find correctly more emotional tweets and in correcting emotion lexicons of low accuracy to perform more accurately.
Roland John Tormey, Nihat Kotluk