Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Compressed Sensing teaches us that measurements can be traded for offline computation if the signal being sensed has a simple enough representation. Proper decoders can exactly recover the high-dimensional signal of interest from a lower-dimensional vector of that signal's observations. In graph domains -- like social, similarity, or interaction networks -- the relevant signals often have to do with the network's cluster structure. Partitioning a graph into different communities induces a piecewise-constant signal, an object that can be decoded via Graph Total Variation (G-TV) minimization even if it is not fully observed. In fact, assume that such a signal can only be accessed by querying vertices at random. Then, we could sensibly ask: what are the sampling probabilities that minimize the number of queries required for a successful G-TV recovery? This thesis is an attempt to answer this question through the study of the success conditions in G-TV minimization programs. I show that the recovery error in these programs undergoes a phase transition in terms of the number of measurements, with a threshold that explicitly depends on the vertex-sampling probabilities. It suffices to minimize this threshold to obtain an optimal sampling design. Yet, sampling optimally in practice has problems of its own. While numerical experiments reveal that it is important to focus on the places of the graph where the signal varies, implementing the optimal design without actually knowing the signal-to-be-sampled remains an open issue.