Cheap Data Analytics using Cold Storage Devices

Anastasia Ailamaki, Raja Appuswamy, Renata Borovica-Gajic
2016
Conference paper

Abstract

Enterprise databases use storage tiering to lower capital and operational expenses. In such a setting, data waterfalls from an SSD-based high-performance tier when it is "hot" (frequently accessed) to a disk-based capacity tier and finally to a tape-based archival tier when "cold" (rarely accessed). To address the unprecedented growth in the amount of cold data, hardware vendors introduced new devices named Cold Storage Devices (CSD) explicitly targeted at cold data workloads. With access latencies in tens of seconds and cost/GB as low as $0.01/GB/month, CSD provide a middle ground between the low-latency (ms), high-cost, HDD-based capacity tier, and high-latency (min to h), low-cost, tape-based, archival tier. Driven by the price/performance aspect of CSD, this paper makes a case for using CSD as a replacement for both capacity and archival tiers of enterprise databases. Although CSD offer major cost savings, we show that current database systems can suffer from severe performance drop when CSD are used as a replacement for HDD due to the mismatch between design assumptions made by the query execution engine and actual storage characteristics of the CSD. We then build a CSD-driven query execution framework, called Skipper, that modifies both the database execution engine and CSD scheduling algorithms to be aware of each other. Using results from our implementation of the architecture based on PostgreSQL and OpenStack Swift, we show that Skipper is capable of completely masking the high latency overhead of CSD, thereby opening up CSD for wider adoption as a storage tier for cheap data analytics over cold data.

Official source

https://infoscience.epfl.ch/record/220295?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Cheap Data Analytics using Cold Storage Devices

Graph Chatbot

Chat with Graph Search

Energy Management of Price-Maker Community Energy Storage by Stochastic Dynamic Programming

Analysis of the influence of errors in DNA-based image coding

Efficient Concurrent Analytical Query Processing using Data and Workload-conscious Sharing

Energy Management of Price-Maker Community Energy Storage by Stochastic Dynamic Programming

Analysis of the influence of errors in DNA-based image coding

Efficient Concurrent Analytical Query Processing using Data and Workload-conscious Sharing