Publication

Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks

Related publications (72)

TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers

David Atienza Alonso, Giovanni Ansaloni, Alireza Amirshahi, Joshua Alexander Harrison Klein

Transformer models have achieved impressive results in various AI scenarios, ranging from vision to natural language processing. However, their computational complexity and their vast number of parameters hinder their implementations on resource-constraine ...
2023

GEAR-RT: Towards Exa-Scale Moment Based Radiative Transfer For Cosmological Simulations Using Task-Based Parallelism And Dynamic Sub-Cycling with SWIFT

Mladen Ivkovic

Numerical simulations have become an indispensable tool in astrophysics and cosmology. The constant need for higher accuracy, higher resolutions, and models ofever-increasing sophistication and complexity drives the development of modern toolswhich target ...
EPFL2023

Reliable Microsecond-Scale Distributed Computing

Athanasios Xygkis

The landscape of computing is changing, thanks to the advent of modern networking equipment that allows machines to exchange information in as little as one microsecond. Such advancement has enabled microsecond-scale distributed computing, where entire dis ...
EPFL2023

An Open-Hardware Coarse-Grained Reconfigurable Array for Edge Computing

David Atienza Alonso, Giovanni Ansaloni, José Angel Miranda Calero, Rubén Rodríguez Álvarez, Juan Pablo Sapriza Araujo, Benoît Walter Denkinger, Ruben Rodriguez

In this work, we propose an open-hardware low-power coarse-grained reconfigurable array connected to a lightweight microcontroller and enclosed in an application mapping framework. The latter provides complete support to configure kernels in the reconfigur ...
2023

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

Paolo Ienne, Mikhail Asiatici

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by re ...
ASSOC COMPUTING MACHINERY2022

Micro-architectural analysis of in-memory OLTP: Revisited

Anastasia Ailamaki, Danica Porobic, Utku Sirin

Micro-architectural behavior of traditional disk-based online transaction processing (OLTP) systems has been investigated extensively over the past couple of decades. Results show that traditional OLTP systems mostly under-utilize the available micro-archi ...
2021

Micro-architectural Analysis of Database Workloads

Utku Sirin

Database workloads have significantly evolved in the past twenty years. Traditional database systems that are mainly used to serve Online Transactional Processing (OLTP) workloads evolved into specialized database systems that are optimized for particular ...
EPFL2021

Rebooting Virtual Memory with Midgard

Babak Falsafi, Mathias Josef Payer, Siddharth Gupta, Atri Bhattacharyya, Yunho Oh, Abhishek Bhattacharjee

Computer systems designers are building cache hierarchies with higher capacity to capture the ever-increasing working sets of modern workloads. Cache hierarchies with higher capacity improve system performance but shift the performance bottleneck to addres ...
2021

NrOS: Effective Replication and Sharing in an Operating System

Sanidhya Kashyap, Ankit Bhardwaj

Writing a correct operating system kernel is notoriously hard. Kernel code requires manual memory management and type-unsafe code and must efficiently handle complex, asynchronous events. In addition, increasing CPU core counts further complicate kernel de ...
USENIX ASSOC2021

Analytic performance modeling and analysis of detailed neuron simulations

Francesco Cremonesi, Gerhard Wellein

Big science initiatives are trying to reconstruct and model the brain by attempting to simulate brain tissue at larger scales and with increasingly more biological detail than previously thought possible. The exponential growth of parallel computer perform ...
2020

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.