**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Timing is Everything

Abstract

In this thesis, timing is everything. In the first part, we mean this literally, as we tackle systems that encode information using timing alone. In the second part, we adopt the standard, metaphoric interpretation of this saying and show the importance of choosing the right time to sample a system, when efficiency is a key consideration.Time encoding machines, or, alternately, integrate-and-fire neurons, encode their inputs using spikes, the timings of which depend on the input and therefore hold information about it. These devices can be made more power efficient than their clocked counterparts and have thus been studied in the fields of signal processing, event-based vision, computational neuroscience and machine learning. However, their timing-based spiking output has so far often been considered a nuisance that one must make do with, rather than a potential advantage. In this thesis, we show that this timing-based output equips spiking devices with capabilities that are out of reach for classical encoding and processing systems.We first discover the benefits of time encoding on multi-channel encoding and recovery of a signal: with time encoding, clock alignment is easy to solve, although it poses problems in the classical sampling scenario. Then, we study the time encoding of low-dimensional signals and see that the asynchrony of spikes allows for a lower sample complexity in comparison with synchronous sampling. Thanks to this same asynchrony, time encoding of video results in an entanglement between spatial sampling density and temporal resolution---a relationship which is not present in frame-based video. Finally, we show that the all-or-none nature of the spikes allows training spiking neural networks in a layer-by-layer fashion---a feat that is impossible with clocked, artificial neural networks, due to the credit assignment problem.The second part of this thesis shows that choosing the right timing of samples can be crucial to ensure efficiency when performing nanoscale magnetic sensing. We are given a stochastic process, where each sample at time t follows a Bernoulli distribution, and which is characterized by oscillation frequencies that we are interested in recovering. We search for an optimal approach to sample this process, such that the variance of the frequencies' estimates is minimized, given constraints on the measurement time. The models we assume stem from the field of nanoscale magnetic sensing, where the number of parameters to be estimated varies with the number of spins one is trying to sense. We present an adaptive approach to choosing samples in both the single-spin and two-spin cases and compare the adaptive algorithm's performance to classical approaches to sampling.In both parts of the thesis, we move away from classical amplitude sampling and consider cases where timing takes the forefront and amplitude information is merely binary, to shows that timing can carry information and that it can control the amount of information gain.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (106)

Loading

Loading

Loading

This thesis focuses on developing efficient algorithmic tools for processing large datasets. In many modern data analysis tasks, the sheer volume of available datasets far outstrips our abilities to process them. This scenario commonly arises in tasks including parameter tuning of machine learning models (e.g., Google Vizier) and training neural networks. These tasks often require solving numerical linear algebraic problems on large matrices, making the classical primitives prohibitively expensive. Hence, there is a crucial need to efficiently compress the available datasets. In other settings, even collecting the input dataset is extremely expensive, making it vital to design optimal data sampling strategies. This is common in applications such as MRI acquisition and spectrum sensing.
The fundamental questions above are often dual to each other, and hence can be addressed using the same set of core techniques. Indeed, exploiting structured Fourier sparsity is a recurring source of efficiency in this thesis, leading to both fast numerical linear algebra methods and sample efficient data acquisition schemes.
One of the main results that we present in this thesis is the first Sublinear-time Model-based Sparse FFT algorithm that achieves a nearly optimal sample complexity for recovery of every signal whose Fourier transform is well approximated by a small number of blocks (e.g., such structure is common in spectrum sensing). Our method matches in sublinear time the result of Baraniuk et. al. (2010), which started the field of model-based compressed sensing. Another highlight of this thesis includes the first Dimension-independent Sparse FFT algorithm that, computes the Fourier transform of a sparse signal in sublinear runtime in any dimension. This is the first algorithm that just like the FFT of Cooley and Tukey is dimension independent and avoids the curse of dimensionality inherent to all previously known techniques. Finally, we give a Universal Sampling Scheme for the reconstruction of structured Fourier signals from continuous measurements. Our approach matches the classical results of Slepian, Pollak, and Landau (1960s) on the reconstruction of bandlimited signals via Prolate Spheroidal Wave Functions and extends these results to a wide class of Fourier structure types.
Besides having classical applications in signal processing and data analysis, Fourier techniques have been at the core of many machine learning tasks such as Kernel Matrix Approximation. The second half of this thesis is dedicated to finding compressed and low-rank representations of kernel matrices, which are the primary means of computation with large kernel matrices in machine learning. We build on Fourier techniques and achieve spectral approximation guarantees to the Gaussian kernel using an optimal number of samples, significantly improving upon the classical Random Fourier Features of Rahimi and Recht (2008). Finally, we present a nearly-optimal Oblivious Subspace Embedding for high-degree Polynomial kernels which leads to nearly-optimal embeddings of the high-dimensional Gaussian kernel. This is the first result that does not suffer from an exponential loss in the degree of the polynomial kernel or the dimension of the input point set, providing exponential improvements over the prior works, including the TensorSketch of Pagh (2013) and application of the celebrated Fast Multipole Method of Greengard and Rokhlin (1986) to the kernel approximation problem.

Graphs offer a simple yet meaningful representation of relationships between data. Thisrepresentation is often used in machine learning algorithms in order to incorporate structuralor geometric information about data. However, it can also be used in an inverted fashion:instead of modelling data through graphs, we model graphs through data distributions. In thisthesis, we explore several applications of this new modelling framework.Starting with the graph learning problem, we exploit the probabilistic model of data giventhrough graphs to propose a multi-graph learning method for structured data mixtures. Weexplore various relations that data can have with the underlying graphs through the notionof graph filters. We propose an algorithm to jointly cluster a set of data and learn a graph foreach of the clusters. Experiments demonstrate promising performance in data clusteringand multiple graph inference, and show desirable properties in terms of interpretability andproper handling of high dimensionality on synthetic and real data. The model has further beenapplied to fMRI data, where the method is used to successfully identify different functionalbrain networks and their activation times.This probabilistic model of data defined through graphs can be very meaningful evenwhen no data is available. Thus, in the second part of this thesis, we use such models torepresent each graph through the probabilistic distribution of data, which varies smoothly onthe graph. Optimal transport allows for a comparison of two such distributions, which in turngives a structurally meaningful measure for graph comparison. We follow by using this distance to formulate a new graph alignment problem based on theoptimal transport framework, and propose an efficient stochastic algorithm based onBayesian exploration to accommodate for the nonconvexity of the graph alignment problem.We demonstrate the performance of our novel framework on different tasks like graph alignment,graph classification and graph signal prediction, and we show that our method leads tosignificant improvement with respect to the state-of-art algorithms.Furthermore, we cast a new formulation for the one-to-many graph alignment problem,allowing for comparison of graphs of different sizes. The resulting alignment problem issolved with stochastic gradient descent, where a novel Dykstra operator ensures that thesolution is a one-to-many (soft) assignment matrix. Experiments on graph alignment and graph classification problems show that our method for one-to-many alignment leads to meaningful improvements with respect to the state-of-the-art algorithms for each of thesetasks.Finally, we explore a family of probabilistic distributions for data based on graph filters.Distances defined through a graph filter give a high level of flexibility in choosing whichgraph properties we want to emphasize. In addition, in order to make the above graphalignment problem more scalable, we formulate an approximation to our filter Wassersteingraph distance that allows for the exploitation of faster algorithms, without grossly sacrificingthe performance. We propose two algorithms, a simple one based on mirror gradient descentand another one built on its stochastic version, which offers a trade-off between speed andaccuracy. Our experiments show the performance benefits of our novel stochastic algorithm,as well as the strong value of flexibility offered by filter-based distances.

Related concepts (38)

Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using di

In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages b

The way our brain learns to disentangle complex signals into unambiguous concepts is fascinating but remains largely unknown. There is evidence, however, that hierarchical neural representations play a key role in the cortex. This thesis investigates biologically plausible models of unsupervised learning of hierarchical representations as found in the brain and modern computer vision models. We use computational modeling to address three main questions at the intersection of artificial intelligence (AI) and computational neuroscience.The first question is: What are useful neural representations and when are deep hierarchical representations needed? We approach this point with a systematic study of biologically plausible unsupervised feature learning in a shallow 2-layer networks on digit (MNIST) and object (CIFAR10) classification. Surprisingly, random features support high performance, especially for large hidden layers. When combined with localized receptive fields, random feature networks approach the performance of supervised backpropagation on MNIST, but not on CIFAR10. We suggest that future models of biologically plausible learning should outperform such random feature benchmarks on MNIST, or that such models should be evaluated in different ways.The second question is: How can hierarchical representations be learned with mechanisms supported by neuroscientific evidence? We cover this question by proposing a unifying Hebbian model, inspired by common models of V1 simple and complex cells based on unsupervised sparse coding and temporal invariance learning. In shallow 2-layer networks, our model reproduces learning of simple and complex cell receptive fields, as found in V1. In deeper networks, we stack multiple layers of Hebbian learning but find that it does not yield hierarchical representations of increasing usefulness. From this, we hypothesise that standard Hebbian rules are too constrained to build increasingly useful representations, as observed in higher areas of the visual cortex or deep artificial neural networks.The third question is: Can AI inspire learning models that build deep representations and are still biologically plausible? We address this question by proposing a learning rule that takes inspiration from neuroscience and recent advances in self-supervised deep learning. The proposed rule is Hebbian, i.e. only depends on pre- and post-synaptic neuronal activity, but includes additional local factors, namely predictive dendritic input and widely broadcasted modulation factors. Algorithmically, this rule applies self-supervised contrastive predictive learning to a causal, biological setting using saccades. We find that networks trained with this generalised Hebbian rule build deep hierarchical representations of images, speech and video.We see our modeling as a potential starting point for both, new hypotheses, that can be tested experimentally, and novel AI models that could benefit from added biological realism.