Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of -layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal inner weight matrix, has the advantage of revealing interesting training characteristics while keeping the theoretical analysis clean and insightful.The manuscript is composed of four parts. The first serves as a general introduction to the depicted architecture, it provides results on the optimisation trajectory of gradient flow, upon which the rest of the manuscript is built. The second part focuses on saddle-to-saddle dynamics. Taking the initialisation scale of the gradient flow to zero, we prove and describe the existence of an asymptotic learning trajectory where coordinates are learnt incrementally. In the third part we focus on the effect of various hyperparameters (namely the batch-size, the stepsize and the momentum parameter) on the solution which is recovered by the corresponding gradient method. The fourth and last part takes a slightly different point of view. An underlying mirror-descent structure emerges when analysing gradient descent on diagonal linear networks and slightly more complex architectures. This consequently encourages a deeper understanding of mirror-descent trajectories. In this context, we prove the convergence of the mirror flow in the linear classification setting towards a maximum margin separating hyperplane.
Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme
Touradj Ebrahimi, Yuhang Lu, Zewei Xu