Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Non-verbal behaviours play an important role in human communication since it can indicate human attention, serve as communication cue in interactions, or even reveal higher level personal constructs. For instance, head nod, a common non-verbal behaviour, can express the agreement or emphasis when people are listening or speaking. Besides, gaze, another non-verbal behaviour, conveys the human attention and can even provide access to thought processes. With the development of Internet and multimedia, large amount of vision data including videos and images becomes accessible and there are more and more requests on video analysis of human behaviour. Therefore, it is meaningful and important to develop vision based methods to extract non-verbal behaviours automatically.
In this thesis, we attempt to address the recognition of two subtle while important non-verbal behaviours, head nod and gaze. The task of head nod detection is to identify a head movement where the head is rotating up and down along the sagittal plane one or several times while the task of gaze estimation is to infer the 3D Line of Sight with respect to a World Coordinate System. Both tasks have already found applications in areas like Psychology and Sociology (social analysis by head nod detection, mental health care by analyzing gaze), Human Computer and Human Robot Interaction (behaviour recognition or integration to enable smooth interaction), Virtual Reality (rendering improvement accounting for the user's gaze directions).
To address these two problems, we first investigated the task of head pose estimation which is a fundamental task for both head nod detection and gaze estimation. We proposed HeadFusion, an approach for 360 degree robust head pose tracking. Basically, this is a model based method which relies on depth information. It mainly addresses the weakness of 3D morphable model (3DMM) based methods which usually require frontal or mid-profile poses since the 3DMM model only cover the face region. Our approach, however, achieves a complete head representation by combining the strengths of a 3DMM model fitted online with a prior-free reconstruction of a 3D full head model providing support for pose estimation from any viewpoint. In addition, we also proposes a symmetry regularizer for accurate 3DMM fitting under partial observations, and exploit visual tracking to address natural head dynamics with fast accelerations. Extensive experiments show that our method achieves accurate and robust head pose tracking in difficult scenarios.
Based on the estimated head pose, we designed a head nod detection approach. Compared to previous approaches, two contributions are made: i) the head rotation dynamic is computed within the head coordinate instead of the camera coordinate, leading to pose invariant gesture dynamics; ii) besides the rotation parameters, a feature related to the head rotation axis is proposed so that nod-like false positives due to body movements could be eliminated. The experiments demonstrate the robustness of our approach.
We then change our research focus to gaze estimation. To achieve robust remote gaze sensing, we first explore the application of multitask learning on gaze estimation. Concretely, we introduce a Constrained Landmark-Gaze Model (CLGM) modelling the joint variation of eye landmark locations (including the iris center) and gaze directions. By relating explicitly visual information (landmarks) to the more abstract gaze values, we
Colin Neil Jones, Yuning Jiang, Yingzhao Lian, Xinliang Dai
Antoine Bosselut, Beatriz Maria Borges Ribeiro, Silin Gao, Deniz Bayazit, Soyoung Oh