Georgios Psaropoulos
Database systems access memory either sequentially or randomly. Contrary to sequential access and despite the extensive efforts of
computer architects, compiler writers, and system builders, random access to data larger than the processor cache has been synonymous to inefficient execution. Especially in the big data era, data processing is memory bound, and accesses to DRAM and non-volatile memory
each take several tens or hundreds of nanoseconds respectively, posing a great challenge to current processors. Due to the mismatch between the way humans write code and the way processors execute this code, workload execution stalls on main memory access, instead of executing the other parallel work that typically exists in big data workloads.
This thesis establishes cooperative multitasking as the principal way to hide memory latency in operations that consist of parallel tasks.
We first systematize cooperative multitasking presenting an analogue of Amdahl's law for latency hiding. More importantly,
we then introduce interleaving with coroutines, a general-purpose and practical technique to interleave the execution of parallel tasks
within one thread interleaved execution and thereby hide memory access. This form of cooperative multitasking enables significant performance improvements for analytical and transactional use cases that suffer from unavoidable memory accesses, such as index joins and tuple reconstruction, as well as concurrent GET and PUT operations in key-value stores. Given enough parallel work, sufficient hardware support for concurrent memory accesses, and a low-overhead mechanism to switch between tasks, interleaved execution renders important database operations oblivious to memory latency and makes their execution behave as if all accessed data resides in the processor cache.
EPFL2019