**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Personne# Johann Paratte

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Unités associées

Chargement

Cours enseignés par cette personne

Chargement

Domaines de recherche associés

Chargement

Publications associées

Chargement

Personnes menant des recherches similaires

Chargement

Cours enseignés par cette personne

Aucun résultat

Publications associées (7)

Domaines de recherche associés

Aucun résultat

Chargement

Chargement

Chargement

Unités associées (3)

Personnes menant des recherches similaires

Aucun résultat

The amount of data that we produce and consume is larger than it has been at any point in the history of mankind, and it keeps growing exponentially. All this information, gathered in overwhelming volumes, often comes with two problematic characteristics: it is complex and deprived of semantical context. A common step to address those issues is to embed raw data in lower dimensions, by finding a mapping which preserves the similarity between data points from their original space to a new one. Measuring similarity between large sets of high-dimensional objects is, however, problematic for two main reasons: first, high-dimensional points are subject to the curse of dimensionality and second, the number of pairwise distances between points is quadratic with respect to the amount of data points. Both problems can be addressed by using nearest neighbours graphs to understand the structure in data. As a matter of fact, most dimensionality reduction methods use similarity matrices that can be interpreted as graph adjacency matrices. Yet, despite recent progresses, dimensionality reduction is still very challenging when applied to very large datasets. Indeed, although recent methods specifically address the problem of scaleability, processing datasets of millions of elements remain a very lengthy process. In this thesis, we propose new contributions which address the problem of scaleability using the framework of Graph Signal Processing, which extends traditional signal processing to graphs. We do so motivated by the premise that graphs are well suited to represent the structure of the data. In the first part of this thesis, we look at quantitative measures for the evaluation of dimensionality reduction methods. Using tools from graph theory and Graph Signal Processing, we show that specific characteristics related to quality can be assessed by taking measures on the graph, which indirectly validates the hypothesis relating graph to structure. The second contribution is a new method for a fast eigenspace approximation of the graph Laplacian. Using principles of GSP and random matrices, we show that an approximated eigensubpace can be recovered very efficiently, which be used for fast spectral clustering or visualization. Next, we propose a compressive scheme to accelerate any dimensionality reduction technique. The idea is based on compressive sampling and transductive learning on graphs: after computing the embedding for a small subset of data points, we propagate the information everywhere using transductive inference. The key components of this technique are a good sampling strategy to select the subset and the application of transductive learning on graphs. Finally, we address the problem of over-discriminative feature spaces by proposing a hierarchical clustering structure combined with multi-resolution graphs. Using efficient coarsening and refinement procedures on this structure, we show that dimensionality reduction algorithms can be run on intermediate levels and up-sampled to all points leading to a very fast dimensionality reduction method. For all contributions, we provide extensive experiments on both synthetic and natural datasets, including large-scale problems. This allows us to show the pertinence of our models and the validity of our proposed algorithms. Following reproducible principles, we provide everything needed to repeat the examples and the experiments presented in this work.

Johann Paratte, Nathanaël Perraudin, Pierre Vandergheynst

Visualizing high-dimensional data has been a focus in data analysis communities for decades, which has led to the design of many algorithms, some of which are now considered references (such as t-SNE for example). In our era of overwhelming data volumes, the scalability of such methods have become more and more important. In this work, we present a method which allows to apply any visualization or embedding algorithm on very large datasets by considering only a fraction of the data as input and then extending the information to all data points using a graph encoding its global similarity. We show that in most cases, using only O(log(N)) samples is sufficient to diffuse the information to all N data points. In addition, we propose quantitative methods to measure the quality of embeddings and demonstrate the validity of our technique on both synthetic and real-world datasets.

Johann Paratte, Yann Mikaël Schoenenberger, Pierre Vandergheynst

Noisy 3D point clouds arise in many applications. They may be due to errors when creating a 3D model from images or simply to imprecise depth sensors. Point clouds can be given geometrical structure using graphs created from the similarity information between points. This paper introduces a method that uses this graph structure and convex optimization methods to denoise 3D point clouds. A short discussion presents how those methods naturally generalize to time-varying inputs such as 3D point cloud time series.

2015