Relative stability toward diffeomorphisms indicates performance in deep nets

Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images. By contrast, we find that the {\it stability toward diffeomorphisms relative to that of generic transformations} $R_f$ correlates remarkably with the test error $\epsilon_t$ . It is of order unity at initialization but decreases by several decades during training for state-of-the-art architectures. For CIFAR10 and 15 known architectures we find $\epsilon_t\approx 0.2\sqrt{R_f}$ , suggesting that obtaining a small $R_f$ is important to achieve good performance. We study how $R_f$ depends on the size of the training set and compare it to a simple model of invariant learning.

Relative stability toward diffeomorphisms indicates performance in deep nets

Graph Chatbot

Chat with Graph Search

Robust machine learning for neuroscientific inference

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Robust machine learning for neuroscientific inference

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation