Recurrent Convolutional Neural Networks for Scene Parsing

Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to. To ensure a good visual coherence and a high class accuracy, it is essential for a scene parser to capture image long range dependencies. In a feed-forward architecture, this can be simply achieved by considering a sufficiently large input context patch, around each pixel to be labeled. We propose an approach consisting of a recurrent convolutional neural network which allows us to consider a large input context, while limiting the capacity of the model. Contrary to most standard approaches, our method does not rely on any segmentation methods, nor any task-specific features. The system is trained in an end-to-end manner over raw pixels, and models complex spatial dependencies with low inference cost. As the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Our approach yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT Flow Dataset, while remaining very fast at test time.

Recurrent Convolutional Neural Networks for Scene Parsing

Graph Chatbot

Chat with Graph Search

Transformer Models for Vision

Deep Learning Generalization with Limited and Noisy Labels

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

Transformer Models for Vision

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

Deep Learning Generalization with Limited and Noisy Labels