Publication

Scrub: Online TroubleShooting for Large Mission-Critical Applications

Willy Zwaenepoel
2018
Article de conférence
Résumé

Scrub is a troubleshooting tool for distributed applications that operate under strict SLOs common in production environments. It allows users to formulate queries on events occurring during execution in order to assess the correctness of the application’s operation. Scrub has been in use for two years at Turn, where developers and users have relied on it to resolve numerous issues in its online advertisement bidding platform. This platform spans thousands of machines across the globe, serving several million bid requests per second, and dispensing many millions of dollars in advertising budgets. Troubleshooting distributed applications is notoriously hard, and its difficulty is exacerbated by the presence of strict SLOs, which requires the troubleshooting tool to have only minimal impact on the hosts running the application. Furthermore, with large amounts of money at stake, users expect to be able to run frequent diagnostics and demand quick evaluation and remediation of any problems. These constraints have led to a number of design and implementation decisions, that go counter to conventional wisdom. In particular, Scrub supports only a restricted form of joins. Its query execution strategy eschews imposing any overhead on the application hosts. In particular, joins, group-by operations and aggregations are sent to a dedicated centralized facility. In terms of implementation, Scrub avoids the overhead and security concerns of dynamic instrumentation. Finally, at all levels of the system, accuracy is traded for minimal impact on the hosts. We present the design and implementation of Scrub and contrast its choices to those made in earlier systems. We illustrate its power by describing a number of use cases, and we demonstrate its negligible overhead on the underlying application. On average, we observe a maximum CPU overhead of up to 2.5% on application hosts and a 1% increase in request latency. These overheads allow the advertisement bidding platform to operate well within its SLOs.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (34)
Publicité en ligne
La publicité en ligne (ou e-publicité) désigne toute action visant à promouvoir un produit, service (économie), une marque ou une organisation auprès d'un groupe d'internautes et/ou de mobinautes contre une rémunération. La publicité en ligne est souvent rémunérée selon le nombre de clics faits par les internautes sur la publicité.
Native advertising
Le native advertising, publicité native, publicité intégrée ou publicité caméléon, est un type de publicité, principalement en ligne, qui s'harmonise avec un contenu éditorial classique sur lequel elle apparaît. Souvent, elle se manifeste comme un article ou une vidéo produit par un annonceur avec l'intention spécifique de promouvoir un produit ou un message, tout en respectant la mise en forme et le style spécifiques du support.
Digital display advertising
Digital display advertising is online graphic advertising through banners, text, images, video, and audio. The main purpose of digital display advertising is to post company ads on third-party websites. A display ad is usually interactive (i.e. clickable), which allows brands and advertisers to engage deeper with the users. A display ad can also be a companion ad for a non-clickable video ad. According to eMarketer, Facebook and Twitter were set to take 33 percent of display ad spending market share by 2017.
Afficher plus
Publications associées (27)

Incentive Mechanism in the Sponsored Content Market With Network Effects

Olga Fink, Mina Montazeri

We propose an incentive mechanism for the sponsored content provider (CP) market in which the communication of users can be represented by a graph, and the private information of the users is assumed to have a continuous distribution function. The CP stipu ...
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2023

Reliable Microsecond-Scale Distributed Computing

Athanasios Xygkis

The landscape of computing is changing, thanks to the advent of modern networking equipment that allows machines to exchange information in as little as one microsecond. Such advancement has enabled microsecond-scale distributed computing, where entire dis ...
EPFL2023

Hardware and Software Support for RPC-Centric Server Architecture

Mark Johnathon Sutherland

Online services have become ubiquitous in technological society, the global demand for which has driven enterprises to construct gigantic datacenters that run their software. Such facilities have also recently become a substrate for third-party organizatio ...
EPFL2022
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.