Large commercial latency-sensitive services, such as web search, run on dedicated clusters provisioned for peak load to ensure responsiveness and tolerate data center outages. As a result, the average load is far lower than the peak load used for provisioning, leading to resource under-utilization. The idle resources can be used to run batch jobs, completing useful work and reducing overall data center provisioning costs. However, this is challenging in practice due to the complexity and stringent tail-latency requirements of latency-sensitive services. Left unmanaged, the competition for machine resources can lead to severe response-time degradation and unmet service-level objectives (SLOs).
Christophe Ballif, Alejandro Pena Bello, Noémie Alice Yvonne Ségolène Jeannin, Jérémy Dumoulin, Nicolas Würsch
Claudia Rebeca Binder Signer, Matteo Barsanti, Selin Yilmaz
Stefano Alberti, Jean-Philippe Hogge, Damien Fasel, Ugo Siravo, Jérémie Dubray, Miguel Filipe Silva Füglister, Pierre-François Isoz