Retaining data from streams of social platforms with minimal regret

Today's social platforms, such as Twitter and Facebook, continuously generate massive volumes of data. The resulting data streams exceed any reasonable limit for permanent storage, especially since data is often redundant, overlapping, sparse, and generally of low value. This calls for means to retain solely a small fraction of the data in an online manner. In this paper, we propose techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized. These techniques enable not only efficient processing of massive streaming data, but are also adaptive and address the dynamic nature of social media. Experiments on large-scale real-world datasets illustrate the feasibility of our approach in terms of both, runtime and information quality.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Retaining data from streams of social platforms with minimal regret

Graph Chatbot

Chat with Graph Search

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

Diversity and neocolonialism in Big Data research: Avoiding extractivism while struggling with paternalism

The Impact of Data Persistence Bias on Social Media Studies

Diversity and neocolonialism in Big Data research: Avoiding extractivism while struggling with paternalism

The Impact of Data Persistence Bias on Social Media Studies

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets