Publication

Human-Centered Scene Understanding via Crowd Counting

Weizhe Liu
2021
Thèse EPFL
Résumé

Human-centered scene understanding is the process of perceiving and analysing a dynamic scene observed through a network of sensors with emphasis on human-related activities. It includes the visual perception of human-related activities from either single image or video sequence. Scene understanding with focus of human-related activities is becoming increasingly popular which results in the demand of algorithms that can efficiently model crowd activity in different real-world scenarios. In this thesis, we exploit human-centered scene understanding through crowd counting. Counting people is a challenging task due to perspective distortion and occlusion. We tackle these problems by developing algorithms to leverage a variety of data modalities including single image, video sequence and scene perspective map. First, we introduce an end-to-end trainable deep architecture for crowd counting that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location. In other words, our approach adaptively encodes the scale of the contextual information required to accurately predict crowd density. This yields an algorithm that outperforms previous crowd counting methods, especially when perspective effects are strong.Second, we explicitly model the scale changes and reason in terms of people per square-meter. We show that feeding the perspective model to the network allows us to enforce global scale consistency and that this model can be obtained on the fly from the drone sensors. In addition, it also enables us to enforce physically-inspired temporal consistency constraints that do not have to be learned. This yields an algorithm that outperforms previous methods in inferring crowd density from a moving drone camera especially when perspective effects are strong.Third, for video sequence, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.