Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

ICML 2021


We study how permutation symmetries in overparameterized multi-layer neural networks generate ‘symmetry-induced’ critical points. Assuming a network with L layers of minimal widths $r_1^∗, \ldots, r_{L-1}^∗$ reaches a zero-loss minimum at $r_1^∗! · · · r_{L-1}^∗!$ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $r^∗ + h ≕ m$ we explicitly describe the manifold of global minima: it consists of $T (r^∗, m)$ affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number $G(r, m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r < r^∗$. Via a combinatorial analysis, we derive closed-form formulas for $T$ and $G$ and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small $h$) and vice versa in the vastly overparameterized regime ($h \gg r^∗$). Our results provide new insights into the minimization of the nonconvex loss function of overparameterized neural networks.

arXiv:2105.12221 [cs]
comments powered by Disqus