Publication

Vision-Based Scene Understanding with Sparsity Promoting Priors

Alexandre Massoud Alahi
2011
EPFL thesis
Abstract

Human beings are interested in understanding their environments and the dynamic content that fills their surroundings. For applications ranging from security to marketing, people have installed networks of cameras to capture the dynamic elements of scenes. In this thesis, we propose a complete real-time system to automatically analyze human behavior from any network of cameras. The proposed system leverages mixed networks of fixed and mobile cameras to locate people, track them, and analyze their trajectories. The mathematical frameworks underlying our proposed methods are based on the following claim: The dynamics of a scene are based on a small set of causes, and therefore can be parameterized by a few degrees of freedom. Every processing block of our system is driven by sparsity promoting priors, i.e., just a few elements are sufficient to capture the scene dynamics. We first present our multi-view people localization algorithm that is designed for a network of fixed cameras. An inverse problem with a sparsity constraint is formulated to detect people using the degraded foreground silhouettes extracted by the cameras. To solve this sparsity driven formulation in a manner appropriate for a real-time implementation, we then propose an approach called "Set Covering Occupancy Object Pursuit" (SCOOP) that outperforms the state-of-the-art. Next, we tackle the data association problem of finding correspondences between located people across time. We implement a graph-based greedy approach to reach real-time tracking performance. Unlike the fixed camera networks considered in the first part of this thesis, mobile cameras are uncalibrated and often monitor non-overlapping fields-of-views with other cameras. We propose a "Cascade of Grids of Image Descriptors" (CaGID) with a sparse search to accurately detect and track objects across uncalibrated cameras with non-overlapping fields-of-views. We evaluate the ability of such mixed networks of cameras to alert drivers to a potential collision with pedestrians. For this application, a camera mounted in a vehicle collaborates with a network of fixed cameras installed in a city. Finally, the proposed system is evaluated for coaching and marketing purposes. The behavior of people in sports games and stores is analyzed in real-time with a graph-based algorithm coined "SpotRank". A probability map inspired by the PageRank algorithm is proposed to rank the most salient 'hot spots' based upon mutual flows. Several public data sets have been used to quantitatively and qualitatively evaluate the performance of our system. To our knowledge, it is the first system to capture the behavior of people in crowded environments and analyze this behavior in real-time with sparsity priors.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.