Publication

Generalizing Bulk-Synchronous Parallel Processing for Data Science: From Data to Threads and Agent-Based Simulations

Zilu Tian
2023
Thèse EPFL
Résumé

Agent-based simulations have been widely applied in many disciplines, by scientists and engineers alike. Scientists use agent-based simulations to tackle global problems, including alleviating poverty, reducing violence, and predicting the impact of pandemics. In industry, engineers use agent-based simulations to reduce cost and improve efficiency, by creating virtual worlds to model different scenarios and explore various designs with fast feedback at low cost. Agent-based simulations play an increasingly prominent role in modern society.Despite their significance, agent-based simulations have benefited little from the recent progress in computer science, especially on the fronts of parallel computing and data management. While there has been a growing need to simulate at an ever-increasing scale with finer details, developments on systems that support fast execution of large-scale simulations and efficient integration of simulations with existing data science pipeline operators are dragging behind. This creates new challenges and opportunities for computer scientists.In this work, we make the first foray into defining a clean semantics that serves as the foundation of agent-based simulations, an abstraction that facilitates users to integrate simulations into data science pipelines, a scalable system architecture with efficient optimizations, and a high-level user-friendly programming model. In particular, we generalize the bulk-synchronous parallel (BSP) processing model to make it better support agent-based simulations. Such simulations frequently exhibit hierarchical structure in their communication patterns which can be exploited to improve performance. We allow for the creation of temporary artificial network partitions during which agents synchronize only locally within their group in a way that does not compromise the correctness of a simulation. We also propose to encapsulate simulations via a \syntaxSimulate\syntax{Simulate} operator, which enables users to compose and nest simulations just like other data science pipeline operators. In addition, we have designed and developed an open-source distributed system for large-scale agent-based simulations, CloudCity, which implements our semantics to improve the locality of computation, communication, and synchronization in simulations. This system contains efficient optimizations to allow fast execution and efficient query of simulation results. To accommodate users from different backgrounds, we have also developed a user-friendly domain-specific language (DSL) embedded in the programming language Scala, which allows users to write parallel agent programs easily, even with little or no background in distributed computing. We experimentally evaluate the performance of our system on a benchmark suite of agent-based simulations and compare it against existing state-of-the-art BSP-like distributed systems, including Spark, GraphX, Giraph, and Flink Gelly, obtaining insights into the impact of various system design choices and optimization on simulation engine performance.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.