Structured and tiled-based pruning of Deep Learning models targeting FPGA implementations

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Model compression techniques have lead to a reduction of size and number of computations of Deep Learning models. However, techniques such as pruning mostly lack of a real co-optimization with hardware platforms. For instance, implementing unstructured pruning in dedicated hardware is not a straightforward task, which increases memory and reduces the effective bandwidth usage. Moreover, such pruning algorithms should be adapted to certain hardware requirements, such as the use of tiling. Therefore, in this work, we leverage the use of the Gumbel-Softmax relaxation sampling to structurally prune tiles, which benefits further hardware implementations, and additionally allows to jointly optimize with quantization. Additionally, we show that the combination of different pruning scenarios leads to a larger sparsity. Finally, we demonstrate the benefit of using structured pruning on fine-grained elements (weights) in an FPGA design.

Structured and tiled-based pruning of Deep Learning models targeting FPGA implementations

Graph Chatbot

Chattez avec Graph Search

BiomedBench: A benchmark suite of TinyML biomedical applications for low-power wearables

Communication-efficient distributed training of machine learning models

Latent Space Slicing for Enhanced Entropy Modeling in Learning-Based Point Cloud Geometry Compression

Latent Space Slicing for Enhanced Entropy Modeling in Learning-Based Point Cloud Geometry Compression

Communication-efficient distributed training of machine learning models

BiomedBench: A benchmark suite of TinyML biomedical applications for low-power wearables