Publication

E-Scan: Consuming Contextual Data with Model Plugins

Anastasia Ailamaki, Viktor Sanca
2023
Conference paper
Abstract

Extracting value and insights from increasingly heterogeneous data sources involves multiple systems combining and consuming the data. With multi-modal and context-rich data such as strings, text, videos, or images, the problem of standardizing the data model and format for interchangeable use is further exacerbated by a non-uniform way of processing, extracting, and preserving content and context from the data. This makes the data movement, reuse, and exchange between different systems a non-composable, manual process. On the other hand, increasingly powerful and popular machine learning-driven data representation models map the input data into uniform high-dimensional vector embeddings for further processing, informed by particular models. However, using models is expensive, and the manual integration effort might exacerbate unnecessary costs. Thus, we propose E-Scan, a contextual data exchange plugin for using, exchanging, and caching context-rich data. We outline the need for a common interface that separates the concerns and allows smooth and cost-effective data exchange. First, while vector embeddings are context-less, the model information is saved to preserve the context and preprocessing steps. Next, a lightweight vector engine caches and stores the uniform intermediate data representation in a lazy way to lower the transformation and data access, exchange, and retrieval cost. Finally, a pull-based interface allows uniform data consumption between components under a common plugin interface. This way, various context-rich data types are stored, processed, and exchanged in a standardized way while allowing plugin-based customization for subsequent context interpretation.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (40)
Open data
Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements such as open-source software, open-source hardware, open content, open specifications, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. The growth of the open data movement is paralleled by a rise in intellectual property rights.
Data model
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner. The corresponding professional activity is called generally data modeling or, more specifically, database design.
Data
In common usage and statistics, data (USˈdætə; UKˈdeɪtə) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures.
Show more
Related publications (63)

Shared metadata for data-centric materials science

Giovanni Pizzi, Ronald Earle Miller, Gian-Marco Rignanese, Carsten Baldauf, Matthias Scheffler, Tristan Bereau

The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data prin ...
Berlin2023

A Research-Practice Partnership to Introduce Computer Science in Secondary School: Lessons from a Pilot Program

Francesco Mondada, Helena Kovacs, Jean-Philippe Pellet, Barbara Bruno, Laila Abdelsalam El-Hamamsy

Context: Introducing Computer Science (CS) into formal education can be challenging, notably when considering the numerous stakeholders involved which include the students, teachers, schools, and policy makers. We believe these perspectives should be con ...
2023

SPECIAL TERMINATION FOR LOG CANONICAL PAIRS

Nikolaos Tsakanikas

We prove the special termination for log canonical pairs and its generalisation in the context of generalised pairs. ...
Somerville2023
Show more
Related MOOCs (11)
Geographical Information Systems 1
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Geographical Information Systems 1
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Introduction to Geographic Information Systems (part 1)
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.