Publication

DiNoDB: Efficient Large-Scale Raw Data Analytics

Anastasia Ailamaki, Marko Vukolic, Ioannis Alagiannis, Erietta Liarou, Yongchao Tian
2014
Article de conférence

Résumé

Modern big data workflows, found in e.g., machine learning use cases, often involve iterations of cycles of batch analytics and interactive analytics on temporary data. Whereas batch analytics solutions for large volumes of raw data are well established (e.g., Hadoop, MapReduce), state-of-the-art interactive analytics solutions (e.g., distributed shared nothing RDBMSs) require data loading and/or transformation phase, which is inherently expensive for temporary data. In this paper, we propose a novel scalable distributed solution for in-situ data analytics, that offers both scalable batch and interactive data analytics on raw data, hence avoiding the loading phase bottleneck of RDBMSs. Our system combines a MapReduce based platform with the recently proposed NoDB paradigm, which optimizes traditional centralized RDBMSs for in-situ queries of raw files. We revisit the NoDB's centralized design and scale it out supporting multiple clients and data processing nodes to produce a new distributed data analytics system we call Distributed NoDB (DiNoDB). DiNoDB leverages MapReduce batch queries to produce critical pieces of metadata (e.g., distributed positional maps and vertical indices) to speed up interactive queries without the overheads of the data loading and data movement phases allowing users to quickly and efficiently exploit their data. Our experimental analysis demonstrates that DiNoDB significantly reduces the data-to-query latency with respect to comparable state-of-the-art distributed query engines, like Shark, Hive and HadoopDB.

Source officielle

https://infoscience.epfl.ch/record/210623?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Anastasia Ailamaki, Marko Vukolic, Ioannis Alagiannis, Erietta Liarou, Yongchao Tian
2014
Article de conférence

Résumé

Source officielle

https://infoscience.epfl.ch/record/210623?ln=fr

À propos de ce résultat

Concepts associés (32)

Publications associées (35)

MOOCs associés (6)

DiNoDB: Efficient Large-Scale Raw Data Analytics

Graph Chatbot

Chattez avec Graph Search

E-Scan: Consuming Contextual Data with Model Plugins

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

E-Scan: Consuming Contextual Data with Model Plugins

Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers