Human Detection and Segmentation via Multi-view Consensus

Self-supervised detection and segmentation of foreground objects aims for accuracy without annotated training data. However, existing approaches predominantly rely on restrictive assumptions on appearance and motion. For scenes with dynamic activities and camera motion, we propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training via coarse 3D localization in a voxel grid and fine-grained offset regression. In this manner, we learn a joint distribution of proposals over multiple views. At inference time, our method operates on single RGB images. We outperform state-of-the-art techniques both on images that visually depart from those of standard benchmarks and on those of the classical Human3.6M dataset.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Human Detection and Segmentation via Multi-view Consensus

Graph Chatbot

Chat with Graph Search

Aggregating Spatial and Photometric Context for Photometric Stereo

Robust machine learning for neuroscientific inference

Self-supervised Dense Representation Learning for Live-Cell Microscopy with Time Arrow Prediction

Aggregating Spatial and Photometric Context for Photometric Stereo

Robust machine learning for neuroscientific inference

Self-supervised Dense Representation Learning for Live-Cell Microscopy with Time Arrow Prediction