Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Today's continued increase in demand for processing power, despite the slowdown of Moore's law, has led to an increase in processor count, which has resulted in energy consumption and distribution problems. To address this, there is a growing trend toward creating more complex heterogeneous systems where multicore, many-core, GPU, FPGA, and DSPs are combined in a single system. This poses challenges in terms of how to take advantage of such systems and how to efficiently program, evaluate, and profile applications where sub-components run on different hardware. Dataflow programming languages like RVC-CAL have proven to be an appropriate methodology for achieving such a complex goal due to their intrinsic portability and the ability to easily decompose a network of actors on different processing units, matching the heterogeneous hardware. Previous research has shown the efficacy of this methodology for systems combining multicore, many-core, CPU, FPGAs, and others. It has also been shown that the performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the computation on the various processing units. In other words, it depends on the parameters that define the partitioning, mapping, scheduling, and allocation of data exchanges among the various processing elements of the platform executing the program. The advantage of programs written in languages using the dataflow model of computation is that executing the program with different configurations and parameter settings does not require rewriting the application software for each configuration but only requires launching a new generation of the execution code corresponding to the parameters, using automatic generation tools. Another competitive advantage of dataflow software methodologies is that they are well-suited to support designs on heterogeneous parallel systems as they are inherently free of memory access contention issues and naturally expose the available intrinsic parallelism. However, it is still an open research question whether dataflow programming languages such as RVC-CAL can fit with massively parallel SIMD architecture such as GPUs. Recent GPU architectures make available numbers of parallel processing units that exceed by orders of magnitude the ones offered by CPU architectures. While programs written using dataflow programming languages are well-suited for programming parallel heterogeneous systems, they may not offer sufficient parallel degrees to efficiently exploit the resources available on today's GPUs. Furthermore, the dynamic nature of the RVC-CAL model may conflict with the very rigid SIMD pipeline. The objective of this thesis is to develop a full suite of tools using the dataflow programming language RVC-CAL to provide an automated design flow for programming, analyzing, and optimizing application programs running on CPU/GPU heterogeneous systems. The main contributions of this thesis are the development of a high-level compiler infrastructure that targets CPU/GPU heterogeneous processing platforms and supports the full specification of the RVC-CAL dataflow programming language, facilities for generating instrumented applications for profiling purposes, and a set of design space exploration pipelines to automatically optimize the resulting application by suggesting performant partition and mapping configurations.

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Graph Chatbot

Chat with Graph Search

EdgeAI-Aware Design of In-Memory Computing Architectures

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

EdgeAI-Aware Design of In-Memory Computing Architectures

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications