**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

Shayan Aziznejad, Joaquim Gonçalves Garcia Barreto Campos, Michaël Unser

*IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, *2022

Article

Article

Résumé

We develop a novel 2D functional learning framework that employs a sparsity-promoting regularization based on second-order derivatives. Motivated by the nature of the regularizer, we restrict the search space to the span of piecewise-linear box splines shifted on a 2D lattice. Our formulation of the infinite-dimensional problem on this search space allows us to recast it exactly as a finite-dimensional one that can be solved using standard methods in convex optimization. Since our search space is composed of continuous and piecewise-linear functions, our work presents itself as an alternative to training networks that deploy rectified linear units, which also construct models in this family. The advantages of our method are fourfold: the ability to enforce sparsity, favoring models with fewer piecewise-linear regions; the use of a rotation, scale and translation-invariant regularization; a single hyperparameter that controls the complexity of the model; and a clear model interpretability that provides a straightforward relation between the parameters and the overall learned function. We validate our framework in various experimental setups and compare it with neural networks.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (11)

Apprentissage

L’apprentissage est un ensemble de mécanismes menant à l'acquisition de savoir-faire, de savoirs ou de connaissances. L'acteur de l'apprentissage est appelé apprenant. On peut opposer l'apprentissag

Neural network

A neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neur

Apprentissage automatique

L'apprentissage automatique (en anglais : machine learning, « apprentissage machine »), apprentissage artificiel ou apprentissage statistique est

Publications associées (32)

Chargement

Chargement

Chargement

Our brain continuously self-organizes to construct and maintain an internal representation of the world based on the information arriving through sensory stimuli. Remarkably, cortical areas related to different sensory modalities appear to share the same functional unit, the neuron, and develop through the same learning mechanism, synaptic plasticity. It motivates the conjecture of a unifying theory to explain cortical representational learning across sensory modalities. In this thesis we present theories and computational models of learning and optimization in neural networks, postulating functional properties of synaptic plasticity that support the apparent universal learning capacity of cortical networks. In the past decades, a variety of theories and models have been proposed to describe receptive field formation in sensory areas. They include normative models such as sparse coding, and bottom-up models such as spike-timing dependent plasticity. We bring together candidate explanations by demonstrating that in fact a single principle is sufficient to explain receptive field development. First, we show that many representative models of sensory development are in fact implementing variations of a common principle: nonlinear Hebbian learning. Second, we reveal that nonlinear Hebbian learning is sufficient for receptive field formation through sensory inputs. A surprising result is that our findings are independent of specific details, and allow for robust predictions of the learned receptive fields. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities. The Hebbian learning theory substantiates that synaptic plasticity can be interpreted as an optimization procedure, implementing stochastic gradient descent. In stochastic gradient descent inputs arrive sequentially, as in sensory streams. However, individual data samples have very little information about the correct learning signal, and it becomes a fundamental problem to know how many samples are required for reliable synaptic changes. Through estimation theory, we develop a novel adaptive learning rate model, that adapts the magnitude of synaptic changes based on the statistics of the learning signal, enabling an optimal use of data samples. Our model has a simple implementation and demonstrates improved learning speed, making this a promising candidate for large artificial neural network applications. The model also makes predictions on how cortical plasticity may modulate synaptic plasticity for optimal learning. The optimal sampling size for reliable learning allows us to estimate optimal learning times for a given model. We apply this theory to derive analytical bounds on times for the optimization of synaptic connections. First, we show this optimization problem to have exponentially many saddle-nodes, which lead to small gradients and slow learning. Second, we show that the number of input synapses to a neuron modulates the magnitude of the initial gradient, determining the duration of learning. Our final result reveals that the learning duration increases supra-linearly with the number of synapses, suggesting an effective limit on synaptic connections and receptive field sizes in developing neural networks.

In modern-data analysis applications, the abundance of data makes extracting meaningful information from it challenging, in terms of computation, storage, and interpretability. In this setting, exploiting sparsity in data has been essential to the development of scalable methods to problems in machine learning, statistics and signal processing. However, in various applications, the input variables exhibit structure beyond simple sparsity. This motivated the introduction of structured sparsity models, which capture such sophisticated structures, leading to a significant performance gains and better interpretability. Structured sparse approaches have been successfully applied in a variety of domains including computer vision, text processing, medical imaging, and bioinformatics. The goal of this thesis is to improve on these methods and expand their success to a wider range of applications. We thus develop novel methods to incorporate general structure a priori in learning problems, which balance computational and statistical efficiency trade-offs. To achieve this, our results bring together tools from the rich areas of discrete and convex optimization. Applying structured sparsity approaches in general is challenging because structures encountered in practice are naturally combinatorial. An effective approach to circumvent this computational challenge is to employ continuous convex relaxations. We thus start by introducing a new class of structured sparsity models, able to capture a large range of structures, which admit tight convex relaxations amenable to efficient optimization. We then present an in-depth study of the geometric and statistical properties of convex relaxations of general combinatorial structures. In particular, we characterize which structure is lost by imposing convexity and which is preserved. We then focus on the optimization of the convex composite problems that result from the convex relaxations of structured sparsity models. We develop efficient algorithmic tools to solve these problems in a non-Euclidean setting, leading to faster convergence in some cases. Finally, to handle structures that do not admit meaningful convex relaxations, we propose to use, as a heuristic, a non-convex proximal gradient method, efficient for several classes of structured sparsity models. We further extend this method to address a probabilistic structured sparsity model, we introduce to model approximately sparse signals.

Shayan Aziznejad, Joaquim Gonçalves Garcia Barreto Campos, Harshit Gupta, Michaël Unser

We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization where an l(1) penalty on the parameters of the activations favors the learning of sparse nonlinearities. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU and we empirically demonstrate the practical aspects of our framework.