Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
During the Artificial Intelligence (AI) revolution of the past decades, deep neural networks have been widely used and have achieved tremendous success in visual recognition. Unfortunately, deploying deep models is challenging because of their huge model size and computational complexity. Therefore, compact neural networks of small model size have been remarkably demanded for embedded/mobile/edge devices, which are omnipresent in our modern AI age.The main goal of this thesis is to improve the training of \emph{arbitrary, given} compact networks. To achieve this, we introduce several methods, including linear over-parameterization and two novel knowledge distillation, to facilitate the training of such compact models, and thus to improve their performance.Over-parameterization was shown to be key to the success of conventional deep models, being essential to facilitate the optimization during training, even though not all the model weights are necessary at inference. Motivated by this observation, in this thesis, we firstly present a general optimization method, ExpandNets, leveraging linear over-parameterization to train a compact network from scratch. Specifically, we introduce two expansion strategies for convolutional layers and one for fully-connected layers by linearly expanding these linear operations into consecutive linear layers, without adding any nonlinearity. Our proposed linear expansion empirically improves the optimization behavior and generalization ability during network training. At test time, such an expanded network can be algebraically contracted back to the original compact network without any information loss, but yields better prediction performance. The effectiveness of ExpandNets is evidenced on several visual recognition tasks, including image classification, object detection, and semantic segmentation.Our first knowledge distillation approach is for object detection, motivated by the fact that the recent knowledge distillation literature remains limited to the scenario where the student and the teacher tackle the same task, with similar network architectures. By contrast, we propose a classifier-to-detector knowledge distillation method for object detection, instead of the standard detector-to-detector distillation strategy. Our method improves the performance of the student detector on both classification and localization. In other words, our method successfully transfers the knowledge not only across architectures but also across tasks.We then extend our knowledge distillation work to the task of 6D pose estimation, where knowledge distillation has been completely unstudied. Specifically, we observe that, for this task, the keypoint-based models are less sensitive than the dense prediction ones to a decrease in the model size. We therefore introduce the first knowledge distillation method for 6D pose estimation by relying on optimal transport theory to align the keypoint distributions of student and teacher networks. Our experiments on several benchmarks show that our distillation method predicts better keypoints and yields state-of-the-art results with different compact student models.To summarize, this thesis contributes to multiple investigations to improve the training phase of \emph{arbitrary, given} compact networks for different visual recognition tasks. Our diverse strategies consistently improve the performance of the compact networks at inference time.