Publications related to Softmax function

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

On the number of regions of piecewise linear neural networks

Michaël Unser, Alexis Marie Frederic Goujon

Many feedforward neural networks (NNs) generate continuous and piecewise-linear (CPWL) mappings. Specifically, they partition the input domain into regions on which the mapping is affine. The number of these so-called linear regions offers a natural metric ...

2024

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Scott William Pesme

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of

2

-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...

EPFL2024

Towards Trustworthy Deep Learning for Image Reconstruction

Alexis Marie Frederic Goujon

The remarkable ability of deep learning (DL) models to approximate high-dimensional functions from samples has sparked a revolution across numerous scientific and industrial domains that cannot be overemphasized. In sensitive applications, the good perform ...

EPFL2024

Benign Overfitting in Deep Neural Networks under Lazy Training

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Zhenyu Zhu

This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayesoptimal test error for classification while obtaining (nearly) zero-trai ...

2023

A Theory of Finite-Width Neural Networks: Generalization, Scaling Laws, and the Loss Landscape

Berfin Simsek

Deep learning has achieved remarkable success in various challenging tasks such as generating images from natural language or engaging in lengthy conversations with humans. The success in practice stems from the ability to successfully train massive neural ...

EPFL2023

Inverse Reinforcement Learning of Pedestrian-Robot Coordination

Aude Billard, David Julian Gonon

We apply inverse reinforcement learning (IRL) with a novel cost feature to the problem of robot navigation in human crowds. Consistent with prior empirical work on pedestrian behavior, the feature anticipates collisions between agents. We efficiently learn ...

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2023

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

Michaël Unser

The minimization of a data-fidelity term and an additive regularization functional gives rise to a powerful framework for supervised learning. In this paper, we present a unifying regularization functional that depends on an operator L\documentclass[12pt]{ ...

Springer2023

Bayes-optimal Learning of Deep Random Networks of Extensive-width

Florent Gérard Krzakala, Lenka Zdeborová, Hugo Chao Cui

We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width ...

2023

ReLU Neural Network Galerkin BEM

Fernando José Henriquez Barraza

We introduce Neural Network (NN for short) approximation architectures for the numerical solution of Boundary Integral Equations (BIEs for short). We exemplify the proposed NN approach for the boundary reduction of the potential problem in two spatial dime ...

SPRINGER/PLENUM PUBLISHERS2023

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Patrick Thiran, Negar Kiyavash, Saber Salehkaleybar

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with

1\le\alpha\le2

which holds in a wide range of applications in machine learning and signal processing. This conditio ...

NeurIPS2022

Face Reconstruction Fromdeep Facial Embeddings Using A Convolutional Neural Network

Sébastien Marcel, Hatef Otroshi Shahreza

State-of-the-art (SOTA) face recognition systems generally use deep convolutional neural networks (CNNs) to extract deep features, called embeddings, from face images. The face embeddings are stored in the system's database and are used for recognition of ...

IEEE2022

Memory of Motion for Initializing Optimization in Robotics

Teguh Santoso Lembono

Many robotics problems are formulated as optimization problems. However, most optimization solvers in robotics are locally optimal and the performance depends a lot on the initial guess. For challenging problems, the solver will often get stuck at poor loc ...

EPFL2022

Generalization Properties of NAS under Activation and Skip Connection Search

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Zhenyu Zhu

Neural Architecture Search (NAS) has fostered the automatic discovery of stateof- the-art neural architectures. Despite the progress achieved with NAS, so far there is little attention to theoretical guarantees on NAS. In this work, we study the generaliza ...

2022

Binary Perceptron: Efficient Algorithms Can Find Solutions in a RareWell-Connected Cluster

Emmanuel Abbé

It was recently shown that almost all solutions in the symmetric binary perceptron are isolated, even at low constraint densities, suggesting that finding typical solutions is hard. In contrast, some algorithms have been shown empirically to succeed in fin ...

ASSOC COMPUTING MACHINERY2022

Impulsive noise removal via a blind CNN enhanced by an iterative post-processing

Seyedeh Sahar Sadrizadeh, Hatef Otroshi Shahreza

In digital imaging, especially in the process of data acquisition and transmission, images are often affected by impulsive noise. Therefore, it is essential to remove impulsive noise from images before any further processing. Due to the remarkable performa ...

ELSEVIER2022

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

Volkan Cevher, Ali Kavis

In this work we investigate stochastic non-convex optimization problems wherethe objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is varianc ...

2021

Learning Lipschitz-Controlled Activation Functions in Neural Networks for Plug-and-Play Image Reconstruction Methods

Michaël Unser, Dimitris Perdios, Pakshal Narendra Bohra, Alexis Marie Frederic Goujon, Sébastien Alexandre Emery

Ill-posed linear inverse problems are frequently encountered in image reconstruction tasks. Image reconstruction methods that combine the Plug-and-Play (PnP) priors framework with convolutional neural network (CNN) based denoisers have shown impressive per ...

2021

Yapa: Accelerated Proximal Algorithm For Convex Composite Problems

Mireille El Gheche, Giovanni Chierchia

Proximal splitting methods are standard tools for nonsmooth optimization. While primal-dual methods have become very popular in the last decade for their flexibility, primal methods may still be preferred for two reasons: acceleration schemes are more effe ...

IEEE2021

Estimating an extreme Bayesian network via scalings

Mario Krali

A recursive max-linear vector models causal dependence between its components by expressing each node variable as a max-linear function of its parental nodes in a directed acyclic graph and some exogenous innovation. Motivated by extreme value theory, inno ...

ELSEVIER INC2021