,
In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter 1 (typically between 0:9 and 0:99). In theory, regret guarantees for onl ...
2020