Learning stereo reconstruction with deep neural networks

Stereo reconstruction is a problem of recovering a 3d structure of a scene from a pair of images of the scene, acquired from different viewpoints. It has been investigated for decades and many successful methods were developed.

The main drawback of these methods is that they typically utilize a single depth cue, such as parallax, defocus blur or shading, and thus are not as robust as a human visual system that simultaneously relies on a range of monocular and binocular cues. This is mainly because it is hard to manually design a model accounting for multiple depth cues. In this work, we address this problem by focusing on deep learning-based stereo methods that can discover a model for multiple depth cues directly from training data with ground truth depth.

The complexity of deep learning-based methods, however, requires very large training sets with ground truth depth, which is often hard or costly to collect. Furthermore, even when training data is available it is often contaminated with noise, which reduces the effectiveness of supervised learning. In this work, in Chapter 3 we show that it is possible to alleviate this problem by using weakly supervised learning, that utilizes geometric constraints of the problem instead of ground truth depth.

Besides the large training set requirement, deep stereo methods are not as application-friendly as traditional methods. They have a large memory footprint and their disparity range is fixed at training time. In this work, in Chapter 4 we address these problems by introducing a novel network architecture with a bottleneck, capable of processing large images and utilizing more context, and an estimator that makes the network less sensitive to stereo matching ambiguities and applicable to any disparity range without re-training.

Because deep learning-based methods discover depth cues directly from training data, they can be adapted to new data modalities without large modifications. In this work, in Chapter 5 we show that our method, developed for a conventional frame-based camera, can be used with a novel event-based camera, that has a higher dynamic range, smaller latency, and low power consumption. This camera instead of sampling intensity of all pixels with a fixed frequency, asynchronously reports events of significant pixel intensity changes. To adopt our method to this new data modality, we propose a novel event sequence embedding module, that firstly aggregates information locally, across time, using a novel fully-connected layer for an irregularly sampled continuous domain, and then across discrete spatial domain.

One interesting application of stereo is a reconstruction of a planet's surface topography from satellite stereo images. In this work, in Chapter 6 we describe a geometric calibration method, as well as mosaicing and stereo reconstruction tools that we developed in the framework of the doctoral project for Color and Stereo Surface Imaging System onboard of ESA's Trace Gas Orbiter, orbiting Mars. For the calibration, we propose a novel method, relying on starfield images because large focal lengths and complex optical distortion of the instrument forbid using standard methods. Scientific and practical results of this work are widely used by a scientific community.

Learning stereo reconstruction with deep neural networks

Graph Chatbot

Chat with Graph Search

Aggregating Spatial and Photometric Context for Photometric Stereo

Robust machine learning for neuroscientific inference

Advancing Self-Supervised Deep Learning for 3D Scene Understanding

Advancing Self-Supervised Deep Learning for 3D Scene Understanding

Robust machine learning for neuroscientific inference

Aggregating Spatial and Photometric Context for Photometric Stereo