Generalization of Scaled Deep ResNets in the Mean-Field Regime

Volkan Cevher, Grigorios Chrysos, Fanghui Liu
2024
conference papers

Résumé

Despite the widespread empirical success of ResNet, the generalization properties of deep ResNet are rarely explored beyond the lazy training regime. In this work, we investigate scaled ResNet in the limit of infinitely deep and wide neural networks, of which the gradient flow is described by a partial differential equation in the large-neural network limit, i.e., the mean-field regime. To derive the generalization bounds under this setting, our analysis necessitates a shift from the conventional time-invariant Gram matrix employed in the lazy training regime to a time-variant, distribution-dependent version. To this end, we provide a global lower bound on the minimum eigenvalue of the Gram matrix under the mean-field regime. Besides, for the traceability of the dynamic of Kullback-Leibler (KL) divergence, we establish the linear convergence of the empirical error and estimate the upper bound of the KL divergence over parameters distribution. Finally, we build the uniform convergence for generalization bound via Rademacher complexity. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime and contribute to advancing the understanding of the fundamental properties of deep neural networks.

Source officielle

https://infoscience.epfl.ch/entities/publication/e3f9b431-201a-4575-a0a8-e420d6c70f67

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Graph Chatbot

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Random matrix methods for high-dimensional machine learning models

Task-driven neural network models predict neural dynamics of proprioception: Experimental data, activations and predictions of neural network models

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Random matrix methods for high-dimensional machine learning models

Task-driven neural network models predict neural dynamics of proprioception: Experimental data, activations and predictions of neural network models