The design and implementation of efficient concurrent data structures has seen significant attention. However, most of this work has focused on concurrent data structures providing good worst-case guarantees, although, in real workloads, objects are often ...
State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extra-gradient), limiting SGD updates to a sub ...
It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and—in asynchronous implementations—on the gradient staleness. Especially, it has been observed that the spe ...
This thesis investigates methods to construct more capable models more efficiently, focusing on two aspects: improved architectures and optimization. We examine principled architectural modifications that reduce computational costs or introduce features fo ...