**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Constructive Training Methods for Feedforward Neural Networks with Binary Weights

Résumé

Quantization of the parameters of a Perceptron is a central problem in hardware implementation of neural networks using a numerical technology. A neural model with each weight limited to a small integer range will require little surface of silicon. Moreover, according to Occam's razor principle, better generalization abilities can be expected from a simpler computational model. The price to pay for these benefits lies in the difficulty to train these kind of networks. This paper proposes essentially two new ideas for constructive training algorithms, and demonstrates their efficiency for the generation of feedforward networks composed of Boolean threshold gates with discrete weights. A proof of the convergence of these algorithms is given. Some numerical experiments have been carried out and the results are presented in terms of the size of the generated networks and of their generalization abilities.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (16)

Publications associées (84)

Poids

Le poids est la force de la pesanteur, d'origine gravitationnelle et inertielle, exercée, par exemple, par la Terre sur un corps massique en raison uniquement du voisinage de la Terre. Son unité da

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Neural network

A neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neur

Chargement

Chargement

Chargement

In this thesis, we propose new algorithms to solve inverse problems in the context of biomedical images. Due to ill-posedness, solving these problems require some prior knowledge of the statistics of the underlying images. The traditional algorithms, in the field, assume prior knowledge related to smoothness or sparsity of these images. Recently, they have been outperformed by the second generation algorithms which harness the power of neural networks to learn required statistics from training data. Even more recently, last generation deep-learning-based methods have emerged which require neither training nor training data. This thesis devises algorithms which progress through these generations. It extends these generations to novel formulations and applications while bringing more robustness. In parallel, it also progresses in terms of complexity, from proposing algorithms for problems with 1D data and an exact known forward model to the ones with 4D data and an unknown parametric forward model. We introduce five main contributions. The last three of them propose deep-learning-based latest-generation algorithms that require no prior training. 1) We develop algorithms to solve the continuous-domain formulation of inverse problems with both classical Tikhonov and total-variation regularizations. We formalize the problems, characterize the solution set, and devise numerical approaches to find the solutions. 2) We propose an algorithm that improves upon end-to-end neural-network-based second generation algorithms. In our method, a neural network is first trained as a projector on a training set, and is then plugged in as a projector inside the projected gradient descent (PGD). Since the problem is nonconvex, we relax the PGD to ensure convergence to a local minimum under some constraints. This method outperforms all the previous generation algorithms for Computed Tomography (CT). 3) We develop a novel time-dependent deep-image-prior algorithm for modalities that involve a temporal sequence of images. We parameterize them as the output of an untrained neural network fed with a sequence of latent variables. To impose temporal directionality, the latent variables are assumed to lie on a 1D manifold. The network is then tuned to minimize the data fidelity. We obtain state-of-the-art results in dynamic magnetic resonance imaging (MRI) and even recover intra-frame images. 4) We propose a novel reconstruction paradigm for cryo-electron-microscopy (CryoEM) called CryoGAN. Motivated by generative adversarial networks (GANs), we reconstruct a biomolecule's 3D structure such that its CryoEM measurements resemble the acquired data in a distributional sense. The algorithm is pose-or-likelihood-estimation-free, needs no ab initio, and is proven to have a theoretical guarantee of recovery of the true structure. 5) We extend CryoGAN to reconstruct continuously varying conformations of a structure from heterogeneous data. We parameterize the conformations as the output of a neural network fed with latent variables on a low-dimensional manifold. The method is shown to recover continuous protein conformations and their energy landscape.

Humans and some other animals are able to perform tasks that require coordination of movements across multiple temporal scales, ranging from hundreds of milliseconds to several seconds. The fast timescale at which neurons naturally operate, on the order of tens of milliseconds, is well-suited to support motor control of rapid movements. In contrast, to coordinate movements on the order of seconds, a neural network should produce reliable dynamics on a similarly âslowâ timescale. Neurons and synapses exhibit biophysical mechanisms whose timescales range from tens of milliseconds to hours, which suggests a possible role of these mechanisms in producing slow reliable dynamics. However, how such mechanisms influence network dynamics is not yet understood. An alternative approach to achieve slow dynamics in a neural network consists in modifying its connectivity structure. Still, the limitations of this approach and in particular to what degree the weights require fine-tuning, remain unclear. Understanding how both the single neuron mechanisms and the connectivity structure might influence the network dynamics
to produce slow timescales is the main goal of this thesis.
We first consider the possibility of obtaining slow dynamics in binary networks by tuning their connectivity. It is known that binary networks can produce sequential dynamics. However, if the sequences consist of random patterns, the typical length of the longest sequence that can be produced grows linearly with the number of units. Here, we show that we can overcome this limitation by carefully designing the sequence structure. More precisely, we obtain a constructive proof that allows to obtain sequences whose length scales exponentially with the number of units. To achieve this however, one needs to exponentially fine-tune the connectivity matrix.
Next, we focus on the interaction between single neuron mechanisms and recurrent dynamics. Particular attention is dedicated to adaptation, which is known to have a broad range of timescales and is therefore particularly interesting for the subject of this thesis. We study the dynamics of a random network with adaptation using mean-field techniques, and we show that the network can enter a state of resonant chaos. Interestingly, the resonance frequency of this state is independent of the connectivity strength and depends only on the properties of the single neuron model. The approach used to study networks with adaptation can also be applied when considering linear rate units with an arbitrary number of auxiliary variables. Based on a qualitative analysis of the mean-field theory for a random network whose neurons are described by a D -dimensional rate model, we conclude that the statistics of the chaotic dynamics are strongly influenced by the single neuron model under investigation.
Using a reservoir computing approach, we show preliminary evidence that slow adaptation can be beneficial when performing tasks that require slow timescales. The positive impact of adaptation on the network performance is particularly strong in the presence of noise. Finally, we propose a network architecture in which the slowing-down effect due to adaptation is combined with a hierarchical structure, with the purpose of efficiently generate sequences that require multiple, hierarchically organized timescales.

The way our brain learns to disentangle complex signals into unambiguous concepts is fascinating but remains largely unknown. There is evidence, however, that hierarchical neural representations play a key role in the cortex. This thesis investigates biologically plausible models of unsupervised learning of hierarchical representations as found in the brain and modern computer vision models. We use computational modeling to address three main questions at the intersection of artificial intelligence (AI) and computational neuroscience.The first question is: What are useful neural representations and when are deep hierarchical representations needed? We approach this point with a systematic study of biologically plausible unsupervised feature learning in a shallow 2-layer networks on digit (MNIST) and object (CIFAR10) classification. Surprisingly, random features support high performance, especially for large hidden layers. When combined with localized receptive fields, random feature networks approach the performance of supervised backpropagation on MNIST, but not on CIFAR10. We suggest that future models of biologically plausible learning should outperform such random feature benchmarks on MNIST, or that such models should be evaluated in different ways.The second question is: How can hierarchical representations be learned with mechanisms supported by neuroscientific evidence? We cover this question by proposing a unifying Hebbian model, inspired by common models of V1 simple and complex cells based on unsupervised sparse coding and temporal invariance learning. In shallow 2-layer networks, our model reproduces learning of simple and complex cell receptive fields, as found in V1. In deeper networks, we stack multiple layers of Hebbian learning but find that it does not yield hierarchical representations of increasing usefulness. From this, we hypothesise that standard Hebbian rules are too constrained to build increasingly useful representations, as observed in higher areas of the visual cortex or deep artificial neural networks.The third question is: Can AI inspire learning models that build deep representations and are still biologically plausible? We address this question by proposing a learning rule that takes inspiration from neuroscience and recent advances in self-supervised deep learning. The proposed rule is Hebbian, i.e. only depends on pre- and post-synaptic neuronal activity, but includes additional local factors, namely predictive dendritic input and widely broadcasted modulation factors. Algorithmically, this rule applies self-supervised contrastive predictive learning to a causal, biological setting using saccades. We find that networks trained with this generalised Hebbian rule build deep hierarchical representations of images, speech and video.We see our modeling as a potential starting point for both, new hypotheses, that can be tested experimentally, and novel AI models that could benefit from added biological realism.