Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture introduces BlinkDB, a framework for approximate query processing, which supports interactive SQL-like aggregate queries over massive datasets by providing fast, approximate answers through sampling. The instructor explains the strategy of BlinkDB, the workflow, speed/accuracy trade-off, and the concept of learning to sample. Various types of samples, such as uniform and stratified samples, are discussed, along with error estimation techniques using closed form aggregate functions and statistical bootstrap.