Polynomial Escape-Time From Saddle Points In Distributed Non-Convex Optimization
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
We present a discriminative clustering approach in which the feature representation can be learned from data and moreover leverage labeled data. Representation learning can give a similarity-based clustering method the ability to automatically adapt to an ...
It is well-known that for any integral domain R, the Serre conjecture ring R(X), i.e., the localization of the univariate polynomial ring R[X] at monic polynomials, is a Bezout domain of Krull dimension
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow net-works, which tend to get stuck in badly-generalizing loca ...
In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversari ...
Self-attention mechanisms and non-local blocks have become crucial building blocks for state-of-the-art neural architectures thanks to their unparalleled ability in capturing long-range dependencies in the input. However their cost is quadratic with the nu ...
The diffusion strategy for distributed learning from streaming data employs local stochastic gradient updates along with exchange of iterates over neighborhoods. In Part I [3] of this work we established that agents cluster around a network centroid and pr ...
Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available wor ...
Determination of the local void fraction in BWRs from in-core neutron noise measurements requires the knowledge of the axial velocity of the void. The purpose of this paper is to revisit the problem of determining the axial void velocity profile from the t ...
In this work we investigate stochastic non-convex optimization problems wherethe objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is varianc ...
Federated learning is a useful framework for centralized learning from distributed data under practical considerations of heterogeneity, asynchrony, and privacy. Federated architectures are frequently deployed in deep learning settings, which generally giv ...