Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In a non-uniform memory access machine, the placement of software threads to hardware cores can have a significant effect on the performance of concurrent applications. Detecting the best possible placement for each application is a necessity for thread scheduling. Yet, due to the difficulty of this problem, operating-system schedulers do not really try to understand the needs of applications, but rather focus on (non-portable) scheduling heuristics. In this paper, we introduce thread-placement learning (TPLE), a technique for understanding the placement requirements of applications. TPLE utilizes machine learning and performance counters for choosing between different placement policies. To feed the machine learning model, TPLE requires a set of portable microbenchmarks that produce training data i.e., performance counter measurements for all the target placement policies. We use this data to train a classifier that is able to choose between these policies online in order to change the thread-placement of a running application. We demonstrate the practicality of TPLE by implementing a thread-placement algorithm, named Slate. Slate is able to automatically and online (i.e., in runtime) select between the two most commonly-used placement policies, namely locality and round-robin placement on the nodes of a multicore. To the best of our knowledge, Slate is the first online thread-placement algorithm that utilizes machine learning in combination with performance counters. We evaluate Slate and show that it achieves up to 93% accuracy in its decisions and outperforms the Linux scheduler by up to 16%.