E-Scan: Consuming Contextual Data with Model Plugins

Anastasia Ailamaki, Viktor Sanca
2023
Article de conférence

Résumé

Extracting value and insights from increasingly heterogeneous data sources involves multiple systems combining and consuming the data. With multi-modal and context-rich data such as strings, text, videos, or images, the problem of standardizing the data model and format for interchangeable use is further exacerbated by a non-uniform way of processing, extracting, and preserving content and context from the data. This makes the data movement, reuse, and exchange between different systems a non-composable, manual process. On the other hand, increasingly powerful and popular machine learning-driven data representation models map the input data into uniform high-dimensional vector embeddings for further processing, informed by particular models. However, using models is expensive, and the manual integration effort might exacerbate unnecessary costs. Thus, we propose E-Scan, a contextual data exchange plugin for using, exchanging, and caching context-rich data. We outline the need for a common interface that separates the concerns and allows smooth and cost-effective data exchange. First, while vector embeddings are context-less, the model information is saved to preserve the context and preprocessing steps. Next, a lightweight vector engine caches and stores the uniform intermediate data representation in a lazy way to lower the transformation and data access, exchange, and retrieval cost. Finally, a pull-based interface allows uniform data consumption between components under a common plugin interface. This way, various context-rich data types are stored, processed, and exchanged in a standardized way while allowing plugin-based customization for subsequent context interpretation.

Source officielle

https://infoscience.epfl.ch/record/304204?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

E-Scan: Consuming Contextual Data with Model Plugins

Graph Chatbot

Chattez avec Graph Search

Data Champions Lunch Talks - Green Bytes: Data-Driven Approaches to EPFL Sustainability

Nanoindentation hardness and modulus of Al2O3-SiO2-CaO and MnO-SiO2-FeO inclusions in iron

Data and scripts for the RaFSIP scheme

Data Champions Lunch Talks - Green Bytes: Data-Driven Approaches to EPFL Sustainability

Nanoindentation hardness and modulus of Al2O3-SiO2-CaO and MnO-SiO2-FeO inclusions in iron

Data and scripts for the RaFSIP scheme