Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

Key-Value Stores (KVS) are foundational infrastructure components for online services. Due to their latency-critical nature, today’s best-performing KVS contain a plethora of full-stack optimizations commonly targeting read-mostly, popularity-skewed workloads. Motivated by production studies showing the increased prevalence of write-intensive workloads, we break down the KVS workload space into four distinct classes, and argue that current designs are only sufficient for two of them. The reason is that KVS concurrency control protocols expose a fundamental tradeoff: avoiding synchronization by partitioning writes across threads is mandatory for high throughput, but necessarily creates load imbalance that grows with core count and write fraction. We break this tradeoff with C-4, a codesign between NIC hardware and KVS software that judiciously separates write requests into two classes: independent ones that can be balanced across threads, and dependent ones which must be queued. C-4 dynamically partitions independent writes with the NIC to increase the load balancing flexibility of current KVS designs, and adds a software layer to the KVS to compact dependent writes into batches. Our evaluation shows that for write-intensive workloads, C-4 reduces 99th% tail latency by 1.3−5× and improves throughput by up to 1.7×.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

Graph Chatbot

Chat with Graph Search

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Gyrokinetic simulations on many- and multi-core architectures with the global electromagnetic Particle-In-Cell Code ORB5

Improving Main-memory Database System Performance through Cooperative Multitasking

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Improving Main-memory Database System Performance through Cooperative Multitasking

Gyrokinetic simulations on many- and multi-core architectures with the global electromagnetic Particle-In-Cell Code ORB5