Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers GPU memory hierarchy, including global, local, shared memory, and caches. It explains the CUDA processing flow, GPU optimizations, and control-flow divergence. The instructor discusses strategies to optimize algorithms for GPUs, exploit shared memory, and coalesce memory accesses. Various techniques to efficiently use parallelism and resources on GPUs are explored, such as reduction operations and addressing bank conflicts. The lecture concludes with a focus on scalability with array size and a summary of optimizing code for GPUs.