**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Distributed Compressed Representation of Correlated Image Sets

Résumé

Vision sensor networks and video cameras find widespread usage in several applications that rely on effective representation of scenes or analysis of 3D information. These systems usually acquire multiple images of the same 3D scene from different viewpoints or at different time instants. Therefore, these images are generally correlated through displacement of scene objects. Efficient compression techniques have to exploit this correlation in order to efficiently communicate the 3D scene information. Instead of joint encoding that requires communication between the cameras, in this thesis we concentrate on distributed representation, where the captured images are encoded independently, but decoded jointly to exploit the correlation between images. One of the most important and challenging tasks relies in estimation of the underlying correlation from the compressed correlated images for effective reconstruction or analysis in the joint decoder. This thesis focuses on developing efficient correlation estimation algorithms and joint representation of multiple correlated images captured by various sensing methodologies, e.g., planar, omnidirectional and compressive sensing (CS) sensors. The geometry of the 2D visual representation and the acquisition complexity vary for each sensor type. Therefore, we need to carefully consider the specific geometric nature of the captured images while developing distributed representation algorithms. In this thesis we propose robust algorithms in different scene analysis and reconstruction scenarios. We first concentrate on the distributed representation of omnidirectional images captured by catadioptric sensors. The omnidirectional images are captured from different viewpoints and encoded independently with a balanced rate distribution among the different cameras. They are mapped on the sphere which captures the plenoptic function in its radial form without Euclidean discrepancies. We propose a transform-based distributed coding algorithm, where the spherical images initially undergo a multi-resolution decomposition. The visual information is then split into two correlated partitions. The encoder transmits one partition after entropy coding, as well as the syndrome bits resulting from the Slepian-Wolf encoding of the other partition. The joint decoder estimates a disparity image to take benefit of the correlation between views and uses the syndrome bits to decode the missing information. Such a strategy proves to be beneficial with respect to the independent processing of images and shows only a small performance loss compared to the joint encoding of different views. The encoding complexity in the previous approach is non-negligible due to the visual information processing based on Slepian-Wolf coding and its associated rate parameter estimation. We therefore discard the Slepian-Wolf encoding and propose a distributed coding solution, where the correlated images are encoded independently using transform-based coding solutions (e.g., SPIHT). The central decoder now builds a correlation model from the compressed images, which is used to jointly decode a pair of images. Experimental results demonstrate that the proposed distributed coding solution improves the rate-distortion performance of the separate coding results for both planar and omnidirectional images. However, this improvement is significant only at medium to high bit rates. We therefore propose a rate allocation scheme that identifies and transmits the necessary visual information from each image to improve the correlation estimation accuracy at low bit rate. Experimental results show that for a given bit budget the proposed encoding scheme permits to compute an accurate correlation estimation comparing to the one obtained with SPIHT, JPEG 2000 or JPEG coding schemes. We show however that the improvement in the correlation estimation comes at the price of penalizing the image reconstruction quality; therefore there exists an interesting trade-off between the accurate correlation estimation and image reconstruction as encoding optimization objectives are different in both cases. Next, we further simplify the encoding complexity by replacing the classical imaging sensors with the simple CS sensors, that directly acquire the compressed images in the form of quantized linear measurements. We now concentrate on the particular problem, where one image is selected as the reference and it is used as a side information for the correlation estimation. We propose a geometry-based model to describe the correlation between the visual information in a pair of images. The joint decoder first captures the most prominent visual features in the reconstructed reference image using geometric functions. Since the images are correlated, these features are likely to be present in the other images too, possibly with geometric transformations. Hence, we propose to estimate the correlation model with a regularized optimization problem that locates these features in the compressed images. The regularization terms enforce smoothness of the transformation field, and consistency between the estimated images and the quantized measurements. Experimental results show that the proposed scheme is able to efficiently estimate the correlation between images for several multi-view and video datasets. The proposed scheme is finally shown to outperform DSC schemes based on unsupervised disparity (or motion) learning, as well as independent coding solutions based on JPEG 2000. We then extend the previous scenario to a symmetric decoding problem, where we are interested to estimate the correlation model directly from the quantized linear measurements without explicitly reconstructing the reference images. We first show that the motion field that represents the main source of correlation between images can be described as a linear operator. We further derive a linear relationship between the correlated measurements in the compressed domain. We then derive a regularized cost function to estimate the correlation model directly in the compressed domain using graph-based optimization algorithms. Experimental results show that the proposed scheme estimates an accurate correlation model among images in both multi-view and video imaging scenarios. We then propose a robust data fidelity term that improves the quality of the correlation estimation when the measurements are quantized. Finally, we show by experiments that the proposed compressed correlation estimation scheme is able to compete the solution of a scheme that estimates a correlation model from the reconstructed images without the complexity of image reconstruction. Finally, we study the benefit of using the correlation information while jointly reconstructing the images from the compressed linear measurements. We consider both the asymmetric and symmetric scenarios described previously. We propose joint reconstruction methodologies based on a constrained optimization problem which is solved using effective proximal splitting methods. The constraints included in our framework enforce the reconstructed images to satisfy both the correlation and the quantized measurements consistency objectives. Experimental results demonstrate that the proposed joint reconstruction scheme improves the quality of the decoded images, when compared to a scheme where the images are handled independently. In this thesis we build efficient distributed scene representation algorithms for the multiple correlated images captured in planar, omnidirectional and CS cameras. The coding rate in our symmetric distributed coding solution stays balanced between the encoders and stays close to the joint encoding solutions. Our novel algorithms lead to effective correlation estimation in different sensing and coding scenarios. In addition, we provide innovative solutions for robust correlation estimation from highly compressed images in simple sensing frameworks. Our CS-based joint reconstruction frameworks effectively exploit the inter-view correlation, that permits to achieve high compression gains compared to state-of-the-art independent and distributed coding solutions.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (70)

Corrélation (statistiques)

En probabilités et en statistique, la corrélation entre plusieurs variables aléatoires ou statistiques est une notion de liaison qui contredit leur indépendance.
Cette corrélation est très souvent

Optimization problem

In mathematics, computer science and economics, an optimization problem is the problem of finding the best solution from all feasible solutions.
Optimization problems can be divided into two catego

Résolution de problème

vignette|Résolution d'un problème mathématique.
La résolution de problème est le processus d'identification puis de mise en œuvre d'une solution à un problème.
Méthodologie
Dans l'ind

Publications associées (112)

Chargement

Chargement

Chargement

Demand has emerged for next generation visual technologies that go beyond conventional 2D imaging. Such technologies should capture and communicate all perceptually relevant three-dimensional information about an environment to a distant observer, providing a satisfying, immersive experience. Camera networks offer a low cost solution to the acquisition of 3D visual information, by capturing multi-view images from different viewpoints. However, the camera's representation of the data is not ideal for common tasks such as data compression or 3D scene analysis, as it does not make the 3D scene geometry explicit. Image-based scene representations fundamentally require a multi-view image model that facilitates extraction of underlying geometrical relationships between the cameras and scene components. Developing new, efficient multi-view image models is thus one of the major challenges in image-based 3D scene representation methods. This dissertation focuses on defining and exploiting a new method for multi-view image representation, from which the 3D geometry information is easily extractable, and which is additionally highly compressible. The method is based on sparse image representation using an overcomplete dictionary of geometric features, where a single image is represented as a linear combination of few fundamental image structure features (edges for example). We construct the dictionary by applying a unitary operator to an analytic function, which introduces a composition of geometric transforms (translations, rotation and anisotropic scaling) to that function. The advantage of this approach is that the features across multiple views can be related with a single composition of transforms. We then establish a connection between image components and scene geometry by defining the transforms that satisfy the multi-view geometry constraint, and obtain a new geometric multi-view correlation model. We first address the construction of dictionaries for images acquired by omnidirectional cameras, which are particularly convenient for scene representation due to their wide field of view. Since most omnidirectional images can be uniquely mapped to spherical images, we form a dictionary by applying motions on the sphere, rotations, and anisotropic scaling to a function that lives on the sphere. We have used this dictionary and a sparse approximation algorithm, Matching Pursuit, for compression of omnidirectional images, and additionally for coding 3D objects represented as spherical signals. Both methods offer better rate-distortion performance than state of the art schemes at low bit rates. The novel multi-view representation method and the dictionary on the sphere are then exploited for the design of a distributed coding method for multi-view omnidirectional images. In a distributed scenario, cameras compress acquired images without communicating with each other. Using a reliable model of correlation between views, distributed coding can achieve higher compression ratios than independent compression of each image. However, the lack of a proper model has been an obstacle for distributed coding in camera networks for many years. We propose to use our geometric correlation model for distributed multi-view image coding with side information. The encoder employs a coset coding strategy, developed by dictionary partitioning based on atom shape similarity and multi-view geometry constraints. Our method results in significant rate savings compared to independent coding. An additional contribution of the proposed correlation model is that it gives information about the scene geometry, leading to a new camera pose estimation method using an extremely small amount of data from each camera. Finally, we develop a method for learning stereo visual dictionaries based on the new multi-view image model. Although dictionary learning for still images has received a lot of attention recently, dictionary learning for stereo images has been investigated only sparingly. Our method maximizes the likelihood that a set of natural stereo images is efficiently represented with selected stereo dictionaries, where the multi-view geometry constraint is included in the probabilistic modeling. Experimental results demonstrate that including the geometric constraints in learning leads to stereo dictionaries that give both better distributed stereo matching and approximation properties than randomly selected dictionaries. We show that learning dictionaries for optimal scene representation based on the novel correlation model improves the camera pose estimation and that it can be beneficial for distributed coding.

Pascal Frossard, Vijayaraghavan Thirumalai

The distributed representation of correlated images is an important challenge in applications such as multi-view imaging in camera networks or low complexity video coding. This paper addresses the problem of distributed coding of images whose correlation is driven by the motion of objects or the positioning of the vision sensors. It concentrates on the problem where images are encoded with compressed linear measurements, which are used for estimation of the correlation between images at decoder. We propose a geometry-based correlation model in order to describe the correlation between pairs of images. We assume that the constitutive components of natural images can be captured by visual features that undergo local transformations (e.g., translation) in different images. These prominent visual features are estimated with a sparse approximation of a reference image by a dictionary of geometric basis functions. The corresponding features in the other images are then identified from the compressed measurements. The correlation model is given by the relative geometric transformations between corresponding features. We thus formulate a regularized optimization problem for the estimation of correspondences where the local transformations between images form a consistent motion or disparity map. Then, we propose an efficient joint reconstruction algorithm that decodes the compressed images such that they stay consistent with the quantized measurements and the correlation model. Experimental results show that the proposed algorithm effectively estimates the correlation between images in video sequences or multi-view data. In addition, the proposed reconstruction strategy provides effective decoding performance that compares advantageously to distributed coding schemes based on disparity or motion learning and to independent coding solution based on JPEG-2000.

2010Pascal Frossard, Vijayaraghavan Thirumalai

The distributed representation of correlated images is an important challenge in applications such as multi-view imaging in camera networks or low complexity video coding. This paper addresses the problem of distributed coding of images whose correlation is driven by the motion of objects or the positioning of the vision sensors. It concentrates on the problem where images are encoded with compressed linear measurements, which are used for estimation of the correlation between images at decoder. We propose a geometry-based correlation model in order to describe the correlation between pairs of images. We assume that the constitutive components of natural images can be captured by visual features that undergo local transformations (e.g., translation) in different images. These prominent visual features are estimated with a sparse approximation of a reference image by a dictionary of geometric basis functions. The corresponding features in the other images are then identified from the compressed measurements. The correlation model is given by the relative geometric transformations between corresponding features. We thus formulate a regularized optimization problem for the estimation of correspondences where the local transformations between images form a consistent motion or disparity map. Then, we propose an efficient joint reconstruction algorithm that decodes the compressed images such that they stay consistent with the quantized measurements and the correlation model. Experimental results show that the proposed algorithm effectively estimates the correlation between images in video sequences or multi-view data. In addition, the proposed reconstruction strategy provides effective decoding performance that compares advantageously to distributed coding schemes based on disparity or motion learning and to independent coding solution based on JPEG-2000.