**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Concept# Graph cuts in computer vision

Résumé

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision), such as , the stereo correspondence problem, , object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph (e.g., normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms).
"Binary" problems (such as denoising

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées

Chargement

Personnes associées

Chargement

Unités associées

Chargement

Concepts associés

Chargement

Cours associés

Chargement

Séances de cours associées

Chargement

Personnes associées (1)

Publications associées (21)

Chargement

Chargement

Chargement

Concepts associés (2)

Unités associées (1)

Segmentation d'image

La segmentation d'image est une opération de s consistant à détecter et rassembler les pixels suivant des critères, notamment d'intensité ou spatiaux, l'image apparaissant ainsi formée de régions uni

Vision par ordinateur

La vision par ordinateur est un domaine scientifique et une branche de l’intelligence artificielle qui traite de la façon dont les ordinateurs peuvent acquérir une compréhension de haut niveau à par

Cours associés (3)

Computer Vision aims at modeling the world from digital images acquired using video or infrared cameras, and other imaging sensors.
We will focus on images acquired using digital cameras. We will introduce basic processing techniques and discuss their field of applicability.

Introduction to the basic techniques of image processing. Introduction to the development of image-processing software and to prototyping in JAVA. Application to real-world examples in industrial vision and biomedical imaging.

This advanced course will provide students with the knowledge to tackle the design of privacy-preserving ICT systems. Students will learn about existing technologies to prect privacy, and how to evaluate the protection they provide.

Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and one scenario in efficiently estimating an empirical expectation. We present novel algorithmic solutions and demonstrate their applications on a wide range of data-sets.
The first scenario deals with learning from small data with large number of classes. This setting is common in computer vision problems such as person re-identification and face verification. To address this problem we present a new algorithm called Weighted Approximate Rank Component Analysis (WARCA), which is scalable, robust, non-linear and is independent of the number of classes. We empirically demonstrate the performance of our algorithm on 9 standard person re-identification data-sets where we obtain state of the art performance in terms of accuracy as well as computational speed.
The second scenario we consider is learning embeddings from sequences. When it comes to learning from sequences, recurrent neural networks have proved to be an effective algorithm. However there are many problems with existing recurrent neural networks which makes them data hungry (high sample complexity) and difficult to train. We present a new recurrent neural network called Kronecker Recurrent Units (KRU), which addresses the issues of existing recurrent neural networks through Kronecker matrices. We show its performance on 7 applications, ranging from problems in computer vision, language modeling, music modeling and speech recognition.
Most of the machine learning algorithms are formulated as minimizing an empirical expectation over a finite collection of samples. In this thesis we also investigate the problem of efficiently estimating a weighted average over large data-sets. We present a new data-structure called Importance Sampling Tree (IST), which permits fast estimation of weighted average without looking at all the samples. We show successfully the evaluation of our data-structure in the training of neural networks in order to efficiently find informative samples.

Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and one scenario in efficiently estimating an empirical expectation. We present novel algorithmic solutions and demonstrate their applications on a wide range of data-sets. The first scenario deals with learning from small data with large number of classes. This setting is common in computer vision problems such as person re-identification and face verification. To address this problem we present a new algorithm called Weighted Approximate Rank Component Analysis (WARCA), which is scalable, robust, non-linear and is independent of the number of classes. We empirically demonstrate the performance of our algorithm on 9 standard person re-identification data-sets where we obtain state of the art performance in terms of accuracy as well as computational speed. The second scenario we consider is learning embeddings from sequences. When it comes to learning from sequences, recurrent neural networks have proved to be an effective algorithm. However there are many problems with existing recurrent neural networks which makes them data hungry (high sample complexity) and difficult to train. We present a new recurrent neural network called Kronecker Recurrent Units (KRU), which addresses the issues of existing recurrent neural networks through Kronecker matrices. We show its performance on 7 applications, ranging from problems in computer vision, language modeling, music modeling and speech recognition. Most of the machine learning algorithms are formulated as minimizing an empirical expectation over a finite collection of samples. In this thesis we also investigate the problem of efficiently estimating a weighted average over large data-sets. We present a new data-structure called Importance Sampling Tree (IST), which permits fast estimation of weighted average without looking at all the samples. We show successfully the evaluation of our data-structure in the training of neural networks in order to efficiently find informative samples.

Representing and reconstructing 3D deformable shapes are two tightly linked problems that have long been studied within the computer vision field. Deformable shapes are truly ubiquitous in the real world, whether be it specific object classes such as humans, garments and animals or more abstract ones such as generic materials deforming under an external force. Practical computer vision algorithms must be able to understand the shapes of objects in the observed scenes to unlock the wide spectrum of much sought after applications ranging from virtual try-on to automated surgeries.Automatic shape reconstruction is known to be an ill-posed problem, especially in the common scenario of a single image input. Therefore, the modern approaches rely on deep learning paradigm which has proven to be extremely effective even for the severely under-constrained computer vision problems. We, too, exploit the success of data-driven approaches, however, we also show that generic deep learning models can greatly benefit from being combined with explicit knowledge originating in computational geometry. We analyze the use of various 3D shape representations and we distinctly focus on one of them, the atlas-based representation, which turns out to be especially suitable for modeling deformable shapes and which we further improve and extend.The atlas-based representation models the surfaces as an ensemble of continuous functions and thus allows for arbitrary resolution and analytical surface analysis. We identify its major shortcomings, namely the patch collapse, patch overlap and strong mapping distortions, and we propose novel regularizers based on analytically computed properties of the reconstructed surfaces. Our approach counteracts the aforementioned drawbacks while yielding higher reconstruction accuracy.We dive into the problematics of atlas-based shape representation deeper and focus on another design flaw, the global inconsistency of the mappings. While it is not reflected in quantitative metrics, it is detrimental to the visual quality of the reconstructed surfaces. Specifically, we design loss functions encouraging intercommunication among the mappings which pushes the resulting surface towards a C1 smooth function and thus dramatically improves the visual quality.Furthermore, we adapt the atlas-based representation so that it could model a full sequence of a deforming object in a temporally-consistent way. The goal is to produce such reconstruction where each surface point always represents the same semantic point on the target GT surface. To achieve such behavior, we note that if each surface point deforms close-to-isometrically, its semantic location likely remains unchanged. Practically, we make use of the Riemannian metric, and force it to remain point-wise constant throughout the sequence. The experiments show that our method yields SotA results on correspondence estimation task.Finally, we look into a particular problem of monocular texture-less deformable shape reconstruction. We propose a multi-task learning approach which jointly produces a normal map, a depth map and a mesh corresponding to the observed surface. We show that producing multiple different 3D representations of the same objects results in higher reconstruction quality. We acquire a large real-world annotated dataset of texture-less deforming objects and we release it for public use.

Séances de cours associées (4)