Anastasia Ailamaki, Georgios Psaropoulos
Index joins present a case of pointer-chasing code that causes data cache misses. In principle, we can hide these cache misses by overlapping them with computation: The lookups involved in an index join are parallel tasks whose execution can be interleaved, so that, when a cache miss occurs in one task, the processor executes independent instructions from another one. Yet, the literature provides no concrete performance model for such interleaved execution and, more importantly, production systems still waste processor cycles on cache misses because (a) hardware and compiler limitations prohibit automatic task interleaving and (b) existing techniques that involve the programmer produce unmaintainable code and are thus avoided in practice. In this paper, we address these shortcomings: we model interleaved execution explaining how to estimate the speedup of any interleaving technique, and we propose interleaving with coroutines, i.e., functions that suspend their execution for later resumption. We deploy coroutines on index joins running in SAP HANA and show that interleaving with coroutines performs like other state-of-the-art techniques, retains close resemblance to the original code, and supports both interleaved and non-interleaved execution in the same implementation. Thus, we establish the first systematic and practical approach for interleaving index joins of any type.