**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Saliency-based representations and multi-component classifiers for visual scene recognition

Résumé

Visual scene recognition deals with the problem of automatically recognizing the high-level semantic concept describing a given image as a whole, such as the environment in which the scene is occurring (e.g. a mountain), or the event that is taking place (e.g. a rock climbing event). Scene categories, especially those related to man-made places and events, present high degrees of intra-class variability and inter-class similarity, which in turn require robust and discriminative recognition systems. An additional requirement for potential applications, such as vision-based spatial reasoning for mobile robots, is efficiency of the classification procedure. The objective of this thesis is to address these challenges, by proposing suitable image representations and classification algorithms. The first part of the thesis focuses on the representation task. We propose a bottom-up image descriptor capturing perceptually coherent structures independently of their position. In particular, our method separately pools features extracted from two perceptually different image regions: the most salient region and the remaining non-salient one. By complementing this Saliency-driven Perceptual Pooling (SPP) with an ad-hoc spatial pooling operation, we obtain compact and robust image representations, particularly suited for indoor and sports scenes. The second part of the thesis is concerned with the classification step. We propose an efficient multi-component classification algorithm, named Multiclass Latent Locally Linear SVM (ML3), able to automatically learn a set of sub-categorical linear models for each class, in a principled latent SVM framework. By linearly combining the sub-categorical models with sample and class specific weights, ML3 is able to efficiently learn smooth non-linear decision boundaries, competitive with those obtained by Gaussian kernel SVMs. ML3 also shows very competitive trade-offs between training time and performance, while ensuring high efficiency of the prediction phase. In the last part of the thesis, we use the ML3 algorithm to improve the efficiency and performance of a recently proposed image classification algorithm, named NBNN, designed to cope with classes with a large diversity. Specifically, we show how with a modification of the NBNN scoring function it is possible to use ML3 to learn a discriminative and compact set of prototypical local features for each class, and thus avoid the extensive Nearest Neighbor search used by NBNN. The resulting algorithm, named NBNL, greatly reduces the memory requirements and testing complexity of NBNN, while significantly improving its performance. The approaches proposed in this thesis effectively exploit the spatial, salient and task-driven structures present in the images, producing compact representations and relatively efficient classification procedures. The SPP representations provide competitive scene recognition performances when coupled with non-linear kernels, while the ML3 algorithm can be used to partially fill the gap between linear and non-linear kernels. Although the performance of NBNN-based methods on scene recognition tasks is still below the one obtained by traditional SVM-based approaches, the proposed NBNL algorithm reduces the performance gap, while significantly speeding up the testing phase. Experiments on three publicly available scene recognition datasets (MIT-Indoor-67, 15-Scenes and UIUC-Sports) show the value of the proposed approaches.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (32)

Apprentissage

L’apprentissage est un ensemble de mécanismes menant à l'acquisition de savoir-faire, de savoirs ou de connaissances. L'acteur de l'apprentissage est appelé apprenant. On peut opposer l'apprentissag

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Mobile robot

A mobile robot is an automatic machine that is capable of locomotion. Mobile robotics is usually considered to be a subfield of robotics and information engineering.
Mobile robots have the capabilit

Publications associées (67)

Chargement

Chargement

Chargement

Machine Learning is a modern and actively developing field of computer science, devoted to extracting and estimating dependencies from empirical data. It combines such fields as statistics, optimization theory and artificial intelligence. In practical tasks, the general aim of Machine Learning is to construct algorithms able to generalize and predict in previously unseen situations based on some set of examples. Given some finite information, Machine Learning provides ways to exract knowledge, describe, explain and predict from data. Kernel Methods are one of the most successful branches of Machine Learning. They allow applying linear algorithms with well-founded properties such as generalization ability, to non-linear real-life problems. Support Vector Machine is a well-known example of a kernel method, which has found a wide range of applications in data analysis nowadays. In many practical applications, some additional prior knowledge is often available. This can be the knowledge about the data domain, invariant transformations, inner geometrical structures in data, some properties of the underlying process, etc. If used smartly, this information can provide significant improvement to any data processing algorithm. Thus, it is important to develop methods for incorporating prior knowledge into data-dependent models. The main objective of this thesis is to investigate approaches towards learning with kernel methods using prior knowledge. Invariant learning with kernel methods is considered in more details. In the first part of the thesis, kernels are developed which incorporate prior knowledge on invariant transformations. They apply when the desired transformation produce an object around every example, assuming that all points in the given object share the same class. Different types of objects, including hard geometrical objects and distributions are considered. These kernels were then applied for images classification with Support Vector Machines. Next, algorithms which specifically include prior knowledge are considered. An algorithm which linearly classifies distributions by their domain was developed. It is constructed such that it allows to apply kernels to solve non-linear tasks. Thus, it combines the discriminative power of support vector machines and the well-developed framework of generative models. It can be applied to a number of real-life tasks which include data represented as distributions. In the last part of the thesis, the use of unlabelled data as a source of prior knowledge is considered. The technique of modelling the unlabelled data with a graph is taken as a baseline from semi-supervised manifold learning. For classification problems, we use this apporach for building graph models of invariant manifolds. For regression problems, we use unlabelled data to take into account the inner geometry of the input space. To conclude, in this thesis we developed a number of approaches for incorporating some prior knowledge into kernel methods. We proposed invariant kernels for existing algorithms, developed new algorithms and adapted a technique taken from semi-supervised learning for invariant learning. In all these cases, links with related state-of-the-art approaches were investigated. Several illustrative experiments were carried out on real data on optical character recognition, face image classification, brain-computer interfaces, and a number of benchmark and synthetic datasets.

,

Learning from experience and adapting to changing stimuli are fundamental capabilities for artificial cognitive systems. This calls for on-line learning methods able to achieve high accuracy while at the same time using limited computer power. Research on autonomous agents has been actively investigating these issues, mostly using probabilistic frameworks and within the context of navigation and learning by imitation. Still, recent results on robot localization have clearly pointed out the potential of discriminative classifiers for cognitive systems. In this paper we follow this approach and propose an on-line version of the Support Vector Machine (SVM) algorithm. Our method, that we call On-line Independent SVM, builds a solution on-line, achieving an excellent accuracy vs.~compactness trade-off. In particular the size of the obtained solution is always bounded, implying a bounded testing time. At the same time, the algorithm converges to the optimal solution at each incremental step, as opposed to similar approaches where optimality is achieved in the limit of infinite number of training data. These statements are supported by experiments on standard benchmark databases as well as on two real-world applications, namely $(a)$ place recognition by a mobile robot in an indoor environment, and $(b)$ human grasping posture classification.

Visual scene recognition deals with the problem of automatically recognizing the high-level semantic concept describing a given image as a whole, such as the environment in which the scene is occurring (e.g. a mountain), or the event that is taking place (e.g. a rock climbing event). Scene categories, especially those related to man-made places and events, present high degrees of intra-class variability and inter-class similarity, which in turn require robust and discriminative recognition systems. An additional requirement for potential applications, such as vision-based spatial reasoning for mobile robots, is efficiency of the classification procedure. The objective of this thesis is to address these challenges, by proposing suitable image representations and classification algorithms. The first part of the thesis focuses on the representation task. We propose a bottom-up image descriptor capturing perceptually coherent structures independently of their position. In particular, our method separately pools features extracted from two perceptually different image regions: the most salient region and the remaining non-salient one. By complementing this Saliency-driven Perceptual Pooling (SPP) with an ad-hoc spatial pooling operation, we obtain compact and robust image representations, particularly suited for indoor and sports scenes. The second part of the thesis is concerned with the classification step. We propose an efficient multi-component classification algorithm, named Multiclass Latent Locally Linear SVM (ML3), able to automatically learn a set of sub-categorical linear models for each class, in a principled latent SVM framework. By linearly combining the sub-categorical models with sample and class specific weights, ML3 is able to efficiently learn smooth non-linear decision boundaries, competitive with those obtained by Gaussian kernel SVMs. ML3 also shows very competitive trade-offs between training time and performance, while ensuring high efficiency of the prediction phase. In the last part of the thesis, we use the ML3 algorithm to improve the efficiency and performance of a recently proposed image classification algorithm, named NBNN, designed to cope with classes with a large diversity. Specifically, we show how with a modification of the NBNN scoring function it is possible to use ML3 to learn a discriminative and compact set of prototypical local features for each class, and thus avoid the extensive Nearest Neighbor search used by NBNN. The resulting algorithm, named NBNL, greatly reduces the memory requirements and testing complexity of NBNN, while significantly improving its performance. The approaches proposed in this thesis effectively exploit the spatial, salient and task-driven structures present in the images, producing compact representations and relatively efficient classification procedures.The SPP representations provide competitive scene recognition performances when coupled with non-linear kernels, while the ML3 algorithm can be used to partially fill the gap between linear and non-linear kernels. Although the performance of NBNN-based methods on scene recognition tasks is still below the one obtained by traditional SVM-based approaches, the proposed NBNL algorithm reduces the performance gap, while significantly speeding up the testing phase. Experiments on three publicly available scene recognition datasets (MIT-Indoor-67, 15-Scenes and UIUC-Sports) show the value of the proposed approaches.