**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Résumé

We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (14)

Apprentissage automatique

L'apprentissage automatique (en anglais : machine learning, « apprentissage machine »), apprentissage artificiel ou apprentissage statistique est

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Optimisation (mathématiques)

L'optimisation est une branche des mathématiques cherchant à modéliser, à analyser et à résoudre analytiquement ou numériquement les problèmes qui consistent à minimiser ou maximiser une fonction sur

Publications associées (31)

Chargement

Chargement

Chargement

Graphs offer a simple yet meaningful representation of relationships between data. Thisrepresentation is often used in machine learning algorithms in order to incorporate structuralor geometric information about data. However, it can also be used in an inverted fashion:instead of modelling data through graphs, we model graphs through data distributions. In thisthesis, we explore several applications of this new modelling framework.Starting with the graph learning problem, we exploit the probabilistic model of data giventhrough graphs to propose a multi-graph learning method for structured data mixtures. Weexplore various relations that data can have with the underlying graphs through the notionof graph filters. We propose an algorithm to jointly cluster a set of data and learn a graph foreach of the clusters. Experiments demonstrate promising performance in data clusteringand multiple graph inference, and show desirable properties in terms of interpretability andproper handling of high dimensionality on synthetic and real data. The model has further beenapplied to fMRI data, where the method is used to successfully identify different functionalbrain networks and their activation times.This probabilistic model of data defined through graphs can be very meaningful evenwhen no data is available. Thus, in the second part of this thesis, we use such models torepresent each graph through the probabilistic distribution of data, which varies smoothly onthe graph. Optimal transport allows for a comparison of two such distributions, which in turngives a structurally meaningful measure for graph comparison. We follow by using this distance to formulate a new graph alignment problem based on theoptimal transport framework, and propose an efficient stochastic algorithm based onBayesian exploration to accommodate for the nonconvexity of the graph alignment problem.We demonstrate the performance of our novel framework on different tasks like graph alignment,graph classification and graph signal prediction, and we show that our method leads tosignificant improvement with respect to the state-of-art algorithms.Furthermore, we cast a new formulation for the one-to-many graph alignment problem,allowing for comparison of graphs of different sizes. The resulting alignment problem issolved with stochastic gradient descent, where a novel Dykstra operator ensures that thesolution is a one-to-many (soft) assignment matrix. Experiments on graph alignment and graph classification problems show that our method for one-to-many alignment leads to meaningful improvements with respect to the state-of-the-art algorithms for each of thesetasks.Finally, we explore a family of probabilistic distributions for data based on graph filters.Distances defined through a graph filter give a high level of flexibility in choosing whichgraph properties we want to emphasize. In addition, in order to make the above graphalignment problem more scalable, we formulate an approximation to our filter Wassersteingraph distance that allows for the exploitation of faster algorithms, without grossly sacrificingthe performance. We propose two algorithms, a simple one based on mirror gradient descentand another one built on its stochastic version, which offers a trade-off between speed andaccuracy. Our experiments show the performance benefits of our novel stochastic algorithm,as well as the strong value of flexibility offered by filter-based distances.

Martin Jaggi, Hossein Shokri Ghadikolaei, Sebastian Urban Stich

In distributed optimization, parameter updates from the gradient computing node devices have to be aggregated in every iteration on the orchestrating server. When these updates are sent over an arbitrary commodity network, bandwidth and latency can be limiting factors. We propose a communication framework where nodes may skip unnecessary uploads. Every node locally accumulates an error vector in memory and self-triggers the upload of the memory contents to the parameter server using a significance filter. The server then uses a history of the nodes' gradients to update the parameter. We characterize the convergence rate of our algorithm in smooth settings (strongly-convex, convex, and nonconvex) and show that it enjoys the same convergence rate as when sending gradients every iteration, with substantially fewer uploads. Numerical experiments on real data indicate a significant reduction of used network resources (total communicated bits and latency), especially in large networks, compared to state-of-the-art algorithms. Our results provide important practical insights for using machine learning over resource-constrained networks, including Internet-of-Things and geo-separated datasets across the globe.

We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.

2018