Understanding generalization and robustness in modern deep learning

Maksym Andriushchenko
2024
EPFL thesis

Abstract

In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversarial examples that can easily fool state-of-the-art deep neural networks into making wrong predictions. Their existence can be seen as a generalization problem: despite the impressive average-case performance, the deep learning models tend to learn non-robust features that can be used for adversarial manipulations. In this thesis, we delve deeply into a range of questions related to robustness and generalization, such as how to accurately evaluate robustness, how to make robust training more efficient, and why some optimization algorithms lead to better generalization and learn qualitatively different features. We start the first direction from exploring computationally efficient methods to perform adversarial training and its failure mode referred to as catastrophic overfitting when the model suddenly loses its robustness after some point in training. Then we provide a better understanding of the robustness evaluation and the progress in the field by proposing new query-efficient black-box adversarial attacks based on random search that do not rely on the gradient information and thus can complement a typical robustness evaluation based on gradient-based methods. Finally, for the same goal, we propose a new community-driven robustness benchmark RobustBench which aims to systematically track the progress in the field in a standardized way. We start the second direction from investigating reasons behind the success of sharpness-aware minimization, a recent algorithm that increases robustness in the parameter space during training and improves generalization for deep networks. Then we discuss why overparameterized models trained with stochastic gradient descent tend to generalize surprisingly well even without any explicit regularization. We study the implicit regularization induced by stochastic gradient descent with large step sizes and its effect on the features learned by the model. Finally, we rigorously study the relationship between sharpness of minima (i.e., robustness in the parameter space) and generalization that prior works observed to correlate to each other. Our study suggests that, contrary to the common belief, sharpness is not a good indicator of generalization and it rather tends to correlate well with some hyperparameters like the learning rate but not inherently with generalization.

Official source

https://infoscience.epfl.ch/record/311026?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Understanding generalization and robustness in modern deep learning

Graph Chatbot

Chat with Graph Search

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Topics in statistical physics of high-dimensional machine learning

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Topics in statistical physics of high-dimensional machine learning