On the performance of random reshuffling in stochastic learning

In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings.

On the performance of random reshuffling in stochastic learning

Graph Chatbot

Chat with Graph Search

On the Generalization of Stochastic Gradient Descent with Momentum

Stochastic adaptive resampling for the estimation of discrete choice models

Epigenome-wide DNA methylation in externalizing behaviours: A review and combined analysis

Epigenome-wide DNA methylation in externalizing behaviours: A review and combined analysis

On the Generalization of Stochastic Gradient Descent with Momentum

Stochastic adaptive resampling for the estimation of discrete choice models