Publication

Low-Overhead Dynamic Instruction Mix Generation using Hybrid Basic Block Profiling

Abstract

Dynamic instruction mixes form an important part of the toolkits of performance tuners, compiler writers, and CPU architects. Instruction mixes are traditionally generated using software instrumentation, an accurate yet slow method, that is normally limited to user-mode code. We present a new method for generating instruction mixes using the Performance Monitoring Unit (PMU) of the CPU. It has very low overhead, extends coverage to kernel-mode execution, and causes only a very modest decrease in accuracy, compared to software instrumentation. In order to achieve this level of accuracy, we develop a new PMU-based data collection method, Hybrid Basic Block Profiling (HBBP). HBBP uses simple machine learning techniques to choose, on a per basic block basis, between data from two conventional sampling methods, Event Based Sampling (EBS) and Last Branch Records (LBR). We implement a profiling tool based on HBBP, and we report on experiments with the industry standard SPEC CPU2006 suite, as well as with two large-scale scientific codes. We observe an improvement in runtime compared to software instrumentation of up to 76x on the tested benchmarks, reducing wait times from hours to minutes. Instruction attribution errors average 2.1%. The results indicate that HBBP provides a favorable tradeoff between accuracy and speed, making it a suitable candidate for use in production environments.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (35)
Central processing unit
A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs). The form, design, and implementation of CPUs have changed over time, but their fundamental operation remains almost unchanged.
Very long instruction word
Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.
Benchmark (computing)
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term benchmark is also commonly utilized for the purposes of elaborately designed benchmarking programs themselves. Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the floating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software.
Show more
Related publications (33)

Data and code associated with the paper 'Mode-Specific Coupling of Nanoparticle-on-Mirror Cavities with Cylindrical Vector Beams'

Christophe Marcel Georges Galland, Valeria Vento, Sachin Suresh Verlekar, Philippe Andreas Rölli

Data and code associated with the following paper: V. Vento et al, Nano Lett. 2023 A thorough explanation of the experiment performed is available there. The name of each sub-folder and file in Maps_data_code.zip indicates the correspondin ...
EPFL Infoscience2023

ALMOST SURE SCATTERING OF THE ENERGY-CRITICAL NLS IN d > 6

Katie Sabrina Catherine Rosie Marsden

We study the energy-critical nonlinear Schrodinger equation with randomised initial data in dimensions d > 6. We prove that the Cauchy problem is almost surely globally well-posed with scattering for randomised supercritical initial data in H-s(Rd) wheneve ...
Springfield2023

Lightweight HI source finding for next generation radio surveys

Jean-Paul Richard Kneib, Frédéric Courbin, Aymeric Alexandre Galan, Austin Chandler Peel, Emma Elizabeth Tolley, Mark Thomas Sargent

Future deep HI surveys will be essential for understanding the nature of galaxies and the content of the Universe. However, the large volume of these data will require distributed and automated processing techniques. We introduce LiSA, a set of python modu ...
ELSEVIER2022
Show more
Related MOOCs (13)
IoT Systems and Industrial Applications with Design Thinking
The first MOOC to provide a comprehensive introduction to Internet of Things (IoT) including the fundamental business aspects needed to define IoT related products.
Show more