CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In the past two decades, the use of ontologies has been proven to be an effective tool for enriching existing information systems in the digital data modelling domain and exploiting those assets for semantic interoperability. With the rise of Industry 4.0, ...
Recent years have seen an exponential increase in the amount of data available in all sciences and application domains. Macroecology is part of this "Big Data" trend, with a strong rise in the volume of data that we are using for our research. Here, we sum ...
We propose fingerprinting, a new technique that consists in constructing compact, fast-to-compute and privacy-preserving binary representations of datasets. We illustrate the effectiveness of our approach on the emblematic big data problem of K-Nearest-Nei ...
Motivation: Unbiased clustering methods are needed to analyze growing numbers of complex data sets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small data sets. ...
Modern industrial, government, and academic organizations are collecting massive amounts of data at an unprecedented scale and pace. The ability to perform timely, predictable and cost-effective analytical processing of such large data sets in order to ext ...
There is a growing need for unbiased clustering algorithms, ideally automated to analyze complex data sets. Topological data analysis (TDA) has been used to approach this problem. This recent field of mathematics discerns characteristic features of a space ...
Many software systems consist of data processing components that analyse large datasets to gather information and learn from these. Often, only part of the data is relevant for analysis. Data processing systems contain an initial preprocessing step that fi ...
With the emergence of brain research initiatives around the world, the need for standards to facilitate neuroscience data sharing is growing. A crucial first step will be to establish a minimal metadata standard that allows the discovery of and access to s ...
Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in mu ...
Wrong manipulation, storage or disposal of chemicals can cause great damage whether it occurs on industrial plants, in academia or at home. Amongst the numerous reasons, lack of knowledge and haste are the most common ones. Except for a few substances subj ...