Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
The goal of transductive learning is to find a way to recover the labels of lots of data with only a few known samples. In this work, we will work on graphs for two reasons. First, it’s possible to construct a graph from a given dataset with features. The main assumption we make is that if two vertices are close (connected with a small weight), they will have similiar labels. Thus, graph theory allows us to solve lots of problems. Second, graph problems can be solved distributively. Imagine you have a sensors network with a limited transmission power. Each sensor can only communicate with its closest neighbours. Implementing a distributive algorithm allows you to compute the solution directly on the sensors and on special point. In that case you can with regression, filter the noise, evaluate a data even if a sensor is broken, detect if a sensor is broken and estimate the data to a point with no sensors. This could also be done centrally by a computer, but doing so all the data will require to be transfered to the computer which consume a lot of resources (here energy). In [1], Zhou et al. present a recursive algorithm to solve the transductive learning problem. This recursive algorithm can also be implemented distributedly. In this work we compare this algorithm with a Chebyshev polynomial approximation, which is presented in [2] and [3]. The main advantage is Chebyshev polynomial are his recursive properties. We will also study the stability of the different algorithms. The criteria to evaluate if an algorithm will be the communication cost, which is the number of messages exchanged between all the vertices. This work is divided into two main parts. In the first one, we consider general, the graph regression problem. This first part is again divided into 2 subproblems. The first one is a general regression problem with prior ∥∇x∥2 (where x is the solution). After showing the solution, we talk about Chebyshev polynomial approximation and graph Fourier transform. We give some examples based on the Minnesota road graph. In the second subproblem, we consider ridge regression. We will study the convergence of our algorithm as a function of the basis. We also compare our algorithm with some known recursive ones and we calculate the impact of the sparsity (needed in order to make our algorithm efficient) on the answer. We have worked on different datasets, but mainly on the DELVE Boston housing. For the second main part, we consider classification. In this case, the answer (the labels) is not continuous anymore, but discrete. In this way, the computer has to take a decision. We show that we can consider this new problem as a one shot filter. Then we talk about the basis used for the graph Fourier transform. We also study how we can construct a graph and some practical cases based on several datasets (the most studied will be USPS). In that part, we compare our Chebyshev approximation to an iterative algorithm. The link between all the problems is the way we use Chebyshev polynomial approximation. We choose a basis and we make a graph Fourier transform. Then we approximate the transfer function in the Fourier domain. We end with an inverse transform. In fact we show that we can avoid the direct and inverse graph Fourier transform and compute the solution directly or (and) distributively.
Maximilien Claude Robert Dreveton