Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Within the context of contemporary machine learning problems, efficiency of optimization process depends on the properties of the model and the nature of the data available, which poses a significant problem as the complexity of either increases ad infinitum. An overarching challenge is to design fast, adaptive algorithms which are provably robust to increasing complexities and unknown properties of the optimization landscape, while ensuring scalable implementation at scale.Having that said, there are two main perspectives to consider: (i) standard proof techniques require precise knowledge of the model and loss function parameters, which are usually prohibitively expensive to estimate; (ii) state-of-the-art methods which show superior performance in practice are mostly heuristics, lacking theoretical basis.In this dissertation, the reader will be presented with several fundamental problem formulations in machine learning which will be studied from the aforementioned perspectives.Specifically, the focus of this dissertation will be on two fundamental concepts; (i) adaptivity: abilitiy of an algorithm to converge without knowing the problem-dependent parameters and (ii) universality: ability of an algorithm to converge adaptively under multiple problem settings simultaneously without any modifications.In the light of this terminology, the goal is to unify the discrepancy between the theory of adaptive algorithms and the heuristic approaches employed in practice.To this end, the results are presented in three chapters based on the properties of the optimization problem; convex minimization, non-convex optimization and monotone variational inequalities.We begin with a universal and adaptive algorithm for compactly constrained convex minimization, which achieves order-optimal convergence rates for smooth/non-smooth problems under deterministic/stochastic oracles, simultaneously. We identify an alternative acceleration scheme together with an appropriate adaptive step-size that enables optimal convergence rates without knowing, a priori, neither the problem parameters nor the problem setting at hand. Then, we propose the first noise-adaptive second-order algorithm which individually adapts to noise in gradient and Hessian estimates.Moving on non-convex minimization, we have two set of results to present; high probability convergence of adaptive gradient method and adaptive variance reduction methods under two scenarios. For the former, we analyze the AdaGrad algorithm under two noise models; bounded variance and sub-Gaussian noise. We provide order-optimal high probability rates while establishing a set of side results on the boundedness of the iterate sequence. For the latter setting of variance reduction, we study both the more generic setting of streaming data and the more practical sub-setting of finite-sum minimization. Under both scenarios, we develop the first parameter-free variance reduction methods with optimal rates in their respective problem settings.Finally, we study the problem of solving monotone variational inequalities under two noise models; standard bounded variance and the relative noise model in which the error in operator computation is proportional to its norm at the evaluation point. With a compatible, mild set of assumptions, we prove that a class of extra-gradient algorithms with a particular adaptive step-size universally adapts to both models of noise without knowing the setting, a priori.
Volkan Cevher, Kimon Antonakopoulos