**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Learning to Find Good Correspondences

Pascal Fua, Vincent Lepetit, Yuki Ono, Mathieu Salzmann, Eduard Trulls Fortuny, Kwang Moo Yi

2018

Article de conférence

2018

Article de conférence

Résumé

We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while embedding global information in it, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (40)

Chargement

Chargement

Chargement

Concepts associés (10)

Apprentissage profond

L'apprentissage profond ou apprentissage en profondeur (en anglais : deep learning, deep structured learning, hierarchical learning) est un sous-domaine de l’intelligence artificiel

Architecture

vignette|upright=1.2|La cathédrale Saint-Pierre de Beauvais, , toute en pierre de taille, est l’exemple le plus aérien et dématérialisé de l'architecture gothique qui atteint là ses limites techniques

Perceptron

Le perceptron est un algorithme d'apprentissage supervisé de classifieurs binaires (c'est-à-dire séparant deux classes). Il a été inventé en 1957 par Frank Rosenblatt au laboratoire d'aéronautique de

Pascal Fua, Vincent Lepetit, Yuki Ono, Mathieu Salzmann, Eduard Trulls Fortuny, Kwang Moo Yi

We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while embedding global information in it, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data.

In this dissertation, we study visual analysis methods for complex ancient Maya writings. The unit sign of a Maya text is called glyph, and may have either semantic or syllabic significance. There are over 800 identified glyph categories, and over 1400 variations across these categories. To enable fast manipulation of data by scholars in Humanities, it is desirable to have automatic visual analysis tools such as glyph categorization, localization, and visualization. Analysis and recognition of glyphs are challenging problems. The same patterns may be observed in different signs but with different compositions. The inter-class variance can thus be significantly low. On the opposite, the intra-class variance can be high, as the visual variants within the same semantic category may differ to a large extent except for some patterns specific to the category. Another related challenge of Maya writings is the lack of a large dataset to study the glyph patterns. Consequently, we study local shape representations, both knowledge-driven and data-driven, over a set of frequent syllabic glyphs as well as other binary shapes, i.e. sketches. This comparative study indicates that a large data corpus and a deep network architecture are needed to learn data-driven representations that can capture the complex compositions of local patterns. To build a large glyph dataset in a short period of time, we study a crowdsourcing approach as an alternative to time-consuming data preparation of experts. Specifically, we work on individual glyph segmentation out of glyph-blocks from the three remaining codices (i.e. folded bark pages painted with a brush). With gradual steps in our crowdsourcing approach, we observe that providing supervision and careful task design are key aspects for non-experts to generate high-quality annotations. This way, we obtain a large dataset (over 9000) of individual Maya glyphs. We analyze this crowdsourced glyph dataset with both knowledge-driven and data-driven visual representations. First, we evaluate two competitive knowledge-driven representations, namely Histogram of Oriented Shape Context and Histogram of Oriented Gradients. Secondly, thanks to the large size of the crowdsourced dataset, we study visual representation learning with deep Convolutional Neural Networks. We adopt three data-driven approaches: assess- ing representations from pretrained networks, fine-tuning the last convolutional block of a pretrained network, and training a network from scratch. Finally, we investigate different glyph visualization tasks based on the studied representations. First, we explore the visual structure of several glyph corpora by applying a non-linear dimensionality reduction method, namely t-distributed Stochastic Neighborhood Embedding, Secondly, we propose a way to inspect the discriminative parts of individual glyphs according to the trained deep networks. For this purpose, we use the Gradient-weighted Class Activation Mapping method and highlight the network activations as a heatmap visualization over an input image. We assess whether the highlighted parts correspond to distinguishing parts of glyphs in a perceptual crowdsourcing study. Overall, this thesis presents a promising crowdsourcing approach, competitive data-driven visual representations, and interpretable visualization methods that can be applied to explore various other Digital Humanities datasets.

Deep neural networks have been empirically successful in a variety of tasks, however their theoretical understanding is still poor. In particular, modern deep neural networks have many more parameters than training data. Thus, in principle they should overfit the training samples and exhibit poor generalization to the complete data distribution. Counter intuitively however, they manage to achieve both high training accuracy and high testing accuracy. One can prove generalization using a validation set, however this can be difficult when training samples are limited and at the same time we do not obtain any information about why deep neural networks generalize well. Another approach is to estimate the complexity of the deep neural network. The hypothesis is that if a network with high training accuracy has high complexity it will have memorized the data, while if it has low complexity it will have learned generalizable patterns. In the first part of this thesis we explore Spectral Complexity, a measure of complexity that depends on combinations of norms of the weight matrices of the deep neural network. For a dataset that is difficult to classify, with no underlying model and/or no recurring pattern, for example one where the labels have been chosen randomly, spectral complexity has a large value, reflecting that the network needs to memorize the labels, and will not generalize well. Putting back the real labels, the spectral complexity becomes lower reflecting that some structure is present and the network has learned patterns that might generalize to unseen data. Spectral complexity results in vacuous estimates of the generalization error (the difference between the training and testing accuracy), and we show that it can lead to counterintuitive results when comparing the generalization error of different architectures. In the second part of the thesis we explore non-vacuous estimates of the generalization error. In Chapter 2 we analyze the case of PAC-Bayes where a posterior distribution over the weights of a deep neural network is learned using stochastic variational inference, and the generalization error is the KL divergence between this posterior and a prior distribution. We find that a common approximation where the posterior is constrained to be Gaussian with diagonal covariance, known as the mean-field approximation, limits significantly any gains in bound tightness. We find that, if we choose the prior mean to be the random network initialization, the generalization error estimate tightens significantly. In Chapter 3 we explore an existing approach to learning the prior mean, in PAC-Bayes, from the training set. Specifically, we explore differential privacy, which ensures that the training samples contribute only a limited amount of information to the prior, making it distribution and not training set dependent. In this way the prior should generalize well to unseen data (as it hasn't memorized individual samples) and at the same time any posterior distribution that is close to it in terms of the KL divergence will also exhibit good generalization.