Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks
Related publications (72)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Way Stealing is a simple architectural modification to a cache-based processor that increases the data bandwidth to and from application-specific instruction set extensions (ISEs), which increase performance and reduce energy consumption. Way Stealing offe ...
Institute of Electrical and Electronics Engineers2014
When it comes to performance, embedded systems share many problems with their higher-end counterparts. The growing gap between top processor frequency and memory access speed, the memory wall, is one such problem. Driven, in part, by low energy consumption ...
At the heart of distributed computing lies the fundamental result that the level of agreement that can be obtained in an asynchronous shared memory model where t processes can crash is exactly t + 1. In other words, an adversary that can crash any subset o ...
This report details the design of two new concurrent data structures, a hash table, called CLHT, and a binary search tree (BST), called BST-TK. Both designs are based on asynchronized concurrency (ASCY), a paradigm consisting of four complementary programm ...
Parametric polymorphism in Scala suffers from the usual drawback of erasure on the Java Virtual Machine: primitive values are boxed, leading to indirect access, wasteful use of heap memory and lack of cache locality. For performance-critical parts of the c ...
Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last- level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ag ...
At the heart of distributed computing lies the fundamental result that the level of agreement that can be obtained in an asynchronous shared memory model where t processes can crash is exactly t + 1. In other words, an adversary that can crash any subset o ...
Chip-multiprocessors require a coherence directory to track data sharing and order accesses to the shared data. Scaling coherence directories to support a large number of cores is challenging due to excessive area requirements of the directories. The state ...
Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given point in time, large, continuous chunks of memory are often ac ...
Distributed shared-memory (DSM) multi- processors provide a scalable hardware platform, but lack the necessary redundancy for mainframe-level reliability and availability. Chip-level redundancy in a DSM server faces a key challenge: the increased latency t ...