Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
More and more intelligent systems have to interact with humans. In order to communicate efficiently, these systems need to perceive and understand us. A key factor of communication is the people's visual focus of attention (VFOA), which is useful to estimate addressees and engagement among others. It is also strongly related to the gaze, its continuous counterpart, whose analysis allows to estimate high-level features, e.g. confidence and tiredness. Beyond communication, interesting statistics can be derived from eye movements themselves. For instance, fixation duration and blink rate were shown to be related to mental health. Thus, VFOA, gaze, and eye movements estimation have great potential in a wide range of fields, like human behaviors analysis, human-computer interactions, psychiatric diagnosis, and so on. However, despite recent improvements in sensors and methods, the precise tracking of people's gaze and VFOA remains difficult without using intrusive sensors or constraining people's movement, which do not suit applications where users behaving naturally are recorded by remote sensors, like in many human-robot interactions or psychological studies, for example.This thesis introduces new approaches to improve gaze and VFOA estimation from videos recorded by remote sensors which could be embedded on robots or be part of a room monitoring system for example. It proposes an unsupervised and online method to calibrate gaze trackers from attention priors used in conversation and manipulation, removing the need for a dedicated calibration session. Also, it proposes a method to estimate the VFOA in setups with an arbitrary and dynamic number of visual targets by using a fixed-sized representing for all the subject and context-related features. Finally, it proposes an eye movements recognition to detect saccades and blinks in eye image sequences without the need for accurate gaze traces. These approaches were validated through experiments on human conversation and object manipulation recordings. Overall, by focusing on the improvement of VFOA and gaze estimation usability, this thesis attempts to make a step toward the democratization of these methods and their application to weakly constrained setups relying on cheap sensing devices.
Aude Billard, Nuno Ricardo Ferreira Duarte, José Santos-Victor
Jean-Marc Odobez, Rémy Alain Siegfried
Michael Herzog, David Pascucci, Gizay Ceylan