Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Given a corpus of news items consisting of images accompanied by text captions, we want to find out “who’s doing what”, i.e. associate names and action verbs in the captions to the face and body pose of the persons in the images. We present a joint model for simultaneously solving the image-caption correspondences and learning visual appearance models for the face and pose classes occurring in the corpus. These models can then be used to recognize people and actions in novel images without captions. We demonstrate experimentally that our joint ‘face and pose’ model solves the correspondence problem better than earlier models covering only the face, and that it can perform recognition of new uncaptioned images.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Graph Chatbot

Chat with Graph Search

How to Boost Face Recognition with StyleGAN?

Robust Outlier Rejection for 3D Registration with Variational Bayes

What is the role of ethics in accreditation documentation from a global view?

How to Boost Face Recognition with StyleGAN?

Robust Outlier Rejection for 3D Registration with Variational Bayes

What is the role of ethics in accreditation documentation from a global view?