Publications related to Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate

Understanding generalization and robustness in modern deep learning

In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversari ...

EPFL2024

Universal and adaptive methods for robust stochastic optimization

Ali Kavis

Within the context of contemporary machine learning problems, efficiency of optimization process depends on the properties of the model and the nature of the data available, which poses a significant problem as the complexity of either increases ad infinit ...

EPFL2023

Byzantine Fault-Tolerance in Federated Local SGD Under 2f-Redundancy

Nirupam Gupta

In this article, we study the problem of Byzantine fault-tolerance in a federated optimization setting, where there is a group of agents communicating with a centralized coordinator. We allow up to

f

Byzantine-faulty agents, which may not follow a prescr ...

Piscataway2023

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Patrick Thiran, Negar Kiyavash, Saber Salehkaleybar

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with

1\le\alpha\le2

which holds in a wide range of applications in machine learning and signal processing. This conditio ...

NeurIPS2022

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Volkan Cevher, Alp Yurtsever, Maria-Luiza Vladarean

We propose a stochastic conditional gradient method (CGM) for minimizing convex finitesum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully inc ...

2022

Discriminative clustering with representation learning with any ratio of labeled to unlabeled data

We present a discriminative clustering approach in which the feature representation can be learned from data and moreover leverage labeled data. Representation learning can give a similarity-based clustering method the ability to automatically adapt to an ...

2022

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Florent Gérard Krzakala, Lenka Zdeborová, Ludovic Théo Stephan, Bruno Loureiro

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow net-works, which tend to get stuck in badly-generalizing loca ...

2022

Quantization for Distributed Processing and Learning of Structured Data

Isabela Cunha Maia Nobre

In the domains of machine learning, data science and signal processing, graph or network data, is becoming increasingly popular. It represents a large portion of the data in computer, transportation systems, energy networks, social, biological, and other s ...

EPFL2022

Low-Power Artificial Neural Network Perceptron Based on Monolayer MoS2

Aleksandra Radenovic, Andras Kis, Mukesh Kumar Tripathi, Zhenyu Wang, Guilherme Migliato Marega

Machine learning and signal processing on the edge are poised to influence our everyday lives with devices that will learn and infer from data generated by smart sensors and other devices for the Internet of Things. The next leap toward ubiquitous electron ...

2022

ADAGRAD Avoids Saddle Points

Kimon Antonakopoulos, Xiao Wang

Adaptive first-order methods in optimization are prominent in machine learning and data science owing to their ability to automatically adapt to the landscape of the function being optimized. However, their convergence guarantees are typically stated in te ...

JMLR-JOURNAL MACHINE LEARNING RESEARCH2022

Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate

Graph Chatbot

Chat with Graph Search

Understanding generalization and robustness in modern deep learning

Universal and adaptive methods for robust stochastic optimization

Byzantine Fault-Tolerance in Federated Local SGD Under 2f-Redundancy

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Discriminative clustering with representation learning with any ratio of labeled to unlabeled data

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Quantization for Distributed Processing and Learning of Structured Data

Low-Power Artificial Neural Network Perceptron Based on Monolayer MoS2

ADAGRAD Avoids Saddle Points

Understanding generalization and robustness in modern deep learning

Universal and adaptive methods for robust stochastic optimization

Quantization for Distributed Processing and Learning of Structured Data

Discriminative clustering with representation learning with any ratio of labeled to unlabeled data

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

ADAGRAD Avoids Saddle Points

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Byzantine Fault-Tolerance in Federated Local SGD Under 2f-Redundancy

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Low-Power Artificial Neural Network Perceptron Based on Monolayer MoS2