Publication

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Abstract

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with LL layers of minimal widths r1,,rL1r_1^*, \ldots, r_{L-1}^* reaches a zero-loss minimum at r1!rL1!r_1^*! \cdots r_{L-1}^*! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r+h=:mr^*+ h =: m we explicitly describe the manifold of global minima: it consists of T(r,m)T(r^*, m) affine subspaces of dimension at least hh that are connected to one another. For a network of width mm, we identify the number G(r,m)G(r,m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r<r^*. Via a combinatorial analysis, we derive closed-form formulas for TT and GG and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small hh) and vice versa in the vastly overparameterized regime (hrh \gg r^*). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (34)
Affine space
In mathematics, an affine space is a geometric structure that generalizes some of the properties of Euclidean spaces in such a way that these are independent of the concepts of distance and measure of angles, keeping only the properties related to parallelism and ratio of lengths for parallel line segments. In an affine space, there is no distinguished point that serves as an origin. Hence, no vector has a fixed origin and no vector can be uniquely associated to a point.
Affine transformation
In Euclidean geometry, an affine transformation or affinity (from the Latin, affinis, "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More generally, an affine transformation is an automorphism of an affine space (Euclidean spaces are specific affine spaces), that is, a function which maps an affine space onto itself while preserving both the dimension of any affine subspaces (meaning that it sends points to points, lines to lines, planes to planes, and so on) and the ratios of the lengths of parallel line segments.
Computer network
A computer network is a set of computers sharing resources located on or provided by network nodes. Computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies. The nodes of a computer network can include personal computers, servers, networking hardware, or other specialized or general-purpose hosts.
Show more
Related publications (38)

The kinectome: A comprehensive kinematic map of human motion in health and disease

Enrico Amico, Antonella Romano, Emahnuel Troisi Lopez

Human voluntary movement stems from the coordinated activations in space and time of many musculoskeletal segments. However, the current methodological approaches to study human movement are still limited to the evaluation of the synergies among a few body ...
WILEY2022

Annealing and Replica-Symmetry in Deep Boltzmann Machines

Emanuele Mingione, Diego Alberici

In this paper we study the properties of the quenched pressure of a multi-layer spin-glass model (a deep Boltzmann Machine in artificial intelligence jargon) whose pairwise interactions are allowed between spins lying in adjacent layers and not inside the ...
SPRINGER2020

Global robustness versus local vulnerabilities in complex synchronous networks

Philippe Jacquod

In complex network-coupled dynamical systems, two questions of central importance are how to identify the most vulnerable components and how to devise a network making the overall system more robust to external perturbations. To address these two questions ...
2019
Show more
Related MOOCs (8)
Introduction to optimization on smooth manifolds: first order methods
Learn to optimize on smooth, nonlinear spaces: Join us to build your foundations (starting at "what is a manifold?") and confidently implement your first algorithm (Riemannian gradient descent).
Smart Cities, Management of Smart Urban Infrastructures
Learn about the principles of management of urban infrastructures in the era of Smart Cities. The introduction of Smart urban technologies into legacy infrastructures has already resulted and will con
Smart Cities, Management of Smart Urban Infrastructures
Learn about the principles of management of urban infrastructures in the era of Smart Cities. The introduction of Smart urban technologies into legacy infrastructures has already resulted and will con
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.