Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $L$ layers of minimal widths $r_1^*, \ldots, r_{L-1}^*$ reaches a zero-loss minimum at $r_1^*! \cdots r_{L-1}^*!$ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $r^*+ h =: m$ we explicitly describe the manifold of global minima: it consists of $T(r^*, m)$ affine subspaces of dimension at least $h$ that are connected to one another. For a network of width $m$ , we identify the number $G(r,m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r<r^*$ . Via a combinatorial analysis, we derive closed-form formulas for $T$ and $G$ and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small $h$ ) and vice versa in the vastly overparameterized regime ( $h \gg r^*$ ). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Graph Chatbot

Chat with Graph Search

Altruism, reciprocity, and tokens to reward forwarding data: Is that fair?

Towards an Understanding of Hydraulic Sensitivity: Graph Theory Contributions to Water Distribution Analysis

A Tutorial-Cum-Survey on Percolation Theory With Applications in Large-Scale Wireless Networks

Altruism, reciprocity, and tokens to reward forwarding data: Is that fair?

A Tutorial-Cum-Survey on Percolation Theory With Applications in Large-Scale Wireless Networks

Towards an Understanding of Hydraulic Sensitivity: Graph Theory Contributions to Water Distribution Analysis