Publication

Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing

Related publications (91)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Solving Wave Equations on Unstructured Geometries

Jan Sickmann Hesthaven

Every wave solver serving the computational study of waves meets a trade-off of two figures of merit—its computational speed and its accuracy. The use of Discontinuous Galerkin (DG) methods on graphical processing units (GPUs) significantly lowers the cost ...

Morgan Kaufmann2012

Fine-grained Parallel Traversals of Irregular Data Structures

James Richard Larus

Fine-grain data parallelism is increasingly common in mainstream processors in the form of long vectors and on-chip GPUs. This paper develops compiler and runtime support to exploit such data parallelism for non-numeric, non-graphic, irregular parallel tas ...

ACM2012

Dataflow Programming for Systems Design Space Exploration for Multicore Platforms

Christophe Lucarz

Nowadays processing systems are asked to support increasing complex and demanding high-performance applications, especially in the signal processing and video processing domains. The design of these systems are becoming extremely challenging because of sev ...

EPFL2011

GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms

David Atienza Alonso, Luca Benini, Martino Ruggiero, Shivani Raghav

The multicore revolution and the ever-increasing complexity of computing systems is dramatically changing system design, analysis and programming of computing platforms. Future architectures will feature hundreds to thousands of simple processors and on-ch ...

IEEE/ACM Press2011

An FPGA-based processing pipeline for high definition stereo video

Andreas Peter Burg

This paper presents a real-time processing platform for high definition stereo video. The system is capable to process stereo-video streams at resolutions up to 1920x1080 at 30 frames per second (1080p30). In the hybrid FPGA-GPU-CPU system, a high-density ...

Hindawi Publishing Corporation2011

Optimization of Portable Parallel Signal Processing Applications by Design Space Exploration of Dataflow Programs

Marco Mattavelli, Christophe Lucarz

This paper describes a methodology for the optimization of portable parallel signal processing applications specified by dataflow programs. The use of dataflow as a programming model for signal processing applications targeting parallel platforms provides ...

Ieee Service Center, 445 Hoes Lane, Po Box 1331, Piscataway, Nj 08855-1331 Usa2011

High level design space exploration of RVC codec specifications for multi-core heterogeneous platforms

Marco Mattavelli, Christophe Lucarz, Ghislain Roquier

Nowadays, the design flow of complex signal processing embedded systems starts with a specification of the application by means of a large and sequential program (usually in C/C++). As we are entering in the multicore era, sequential programs are no longer ...

2010

A Tighter Analysis of Work Stealing

Nicolas Gabriel Gast, Denis Trystram

Classical list scheduling is a very popular and efficient technique for scheduling jobs in parallel platforms. However, with the increasing number of processors, the cost for managing a single centralized list becomes prohibitive. The objective of this wor ...

Springer Berlin Heidelberg2010

Montgomery Multiplication on the Cell

Joppe Willem Bos, Marcelo Kaihara

A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in ...

Springer-Verlag New York, Ms Ingrid Cunningham, 175 Fifth Ave, New York, Ny 10010 Usa2010

A hardware-software codesign framework for cellular computing

Pierre-André Mudry

Until recently, the ever-increasing demand of computing power has been met on one hand by increasing the operating frequency of processors and on the other hand by designing architectures capable of exploiting parallelism at the instruction level through h ...

EPFL2009