Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Portability and efficiency are usually antagonists in multi-core computing. In order to develop efficient code, one needs to take into account the topology of the target multi-cores (e.g., for locality). This clearly hampers code portability. In this paper, we show that you can have the cake and eat it too. We introduce MCTOP, an abstraction of multi-core topologies augmented with important low-level hardware information, such as memory bandwidths and communication latencies. We show how to automatically generate MCTOP using libmctop, our library that leverages the determinism of cache-coherence protocols to infer the topology of multi-cores using only latency measurements. MCTOP enables developers to accurately and portably define high-level performance optimization policies. We illustrate several such policies through four examples: (i-ii) thread placement in OpenMP and in a MapReduce library, (iii) a topology-aware mergesort algorithm, as well as (iv) automatic backoff schemes for locks. We illustrate the portability of these optimizations on five processors from Intel, AMD, and Oracle, with low effort.
David Atienza Alonso, Giovanni Ansaloni, Alireza Amirshahi
David Atienza Alonso, Marina Zapater Sancho, Giovanni Ansaloni, Rafael Medina Morillas, Yasir Mahmood Qureshi, Joshua Alexander Harrison Klein
David Atienza Alonso, Marina Zapater Sancho, Luis Maria Costero Valero, Darong Huang, Ali Pahlevan