Deep Learning Theory Through the Lens of Diagonal Linear Networks

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of $2$ -layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal inner weight matrix, has the advantage of revealing interesting training characteristics while keeping the theoretical analysis clean and insightful.The manuscript is composed of four parts. The first serves as a general introduction to the depicted architecture, it provides results on the optimisation trajectory of gradient flow, upon which the rest of the manuscript is built. The second part focuses on saddle-to-saddle dynamics. Taking the initialisation scale of the gradient flow to zero, we prove and describe the existence of an asymptotic learning trajectory where coordinates are learnt incrementally. In the third part we focus on the effect of various hyperparameters (namely the batch-size, the stepsize and the momentum parameter) on the solution which is recovered by the corresponding gradient method. The fourth and last part takes a slightly different point of view. An underlying mirror-descent structure emerges when analysing gradient descent on diagonal linear networks and slightly more complex architectures. This consequently encourages a deeper understanding of mirror-descent trajectories. In this context, we prove the convergence of the mirror flow in the linear classification setting towards a maximum margin separating hyperplane.

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Graph Chatbot

Chat with Graph Search

Topics in statistical physics of high-dimensional machine learning

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Topics in statistical physics of high-dimensional machine learning

Generalization of Scaled Deep ResNets in the Mean-Field Regime