Summary
In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has become ubiquitous. Anything transmitted over the Internet is transmitted as a data stream. Using a mobile phone to have a conversation transmits the sound as a data stream. In a formal way, a data stream is any ordered pair where: is a sequence of tuples and is a sequence of positive real time intervals. Data Stream contains different sets of data, that depend on the chosen data format. Attributes – each attribute of the data stream represents a certain type of data, e.g. segment / data point ID, timestamp, geodata. Timestamp attribute helps to identify when an event occurred. Subject ID is an encoded-by-algorithm ID, that has been extracted out of a cookie. Raw Data includes information straight from the data provider without being processed by an algorithm nor human. Processed Data is a data that has been prepared (somehow modified, validated or cleaned), to be used for future actions. There are various areas where data streams are used: Fraud detection & scoring – raw data is used as source data for an anti-fraud algorithm (data analysis techniques for fraud detection). For example, timestamps, cookie occurrences or analysis of data points are used within the scoring system to detect fraud or to make sure that a message receiver is not a bot (so-called Non-Human Traffic). Artificial intelligence – raw data is treated like a train set and a test set during AI and machine learning algorithms building. Raw data is used for profiling and personalization to customize user profiles and divide them for segmentation, e.g., per gender or location (based on data point). Business intelligence – raw data is a source of information for BI systems, used for enriching user profiles with detailed information about them, e.g., purchase path or geodata. This information is used for business analysis and predictive research.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (1)
CS-422: Database systems
This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students
Related lectures (4)
Data Stream Processing: Management and Challenges
Explores data stream management, real-time applications, challenges in analysis, and efficient stream management strategies.
Piecewise Continuous Functions
Covers piecewise continuous functions, their properties, classification based on continuity, and integral types, including improper integrals.
Urban Metabolism Analysis
Explores systemic environmental assessment, national material flow analysis, and urban metabolism dashboard development for Zurich using open data.
Show more
Related publications (25)

Layer-Wise Learning Framework for Efficient DNN Deployment in Biomedical Wearable Systems

David Atienza Alonso, Amir Aminifar, Tomas Teijeiro Campo, Alireza Amirshahi, Saleh Baghersalimi

The development of low-power wearable systems requires specialized techniques to accommodate their unique requirements and constraints. While significant advancements have been made in the inference phase of artificial intelligence, the training phase rema ...
2023

Storage Management in Smart Data Lake

Anastasia Ailamaki, Haoqiong Bian, Bikash Chandra, Ioannis Mytilinis

Data lakes are complex ecosystems where heterogeneity prevails. Raw data of diverse formats are stored and processed, while long and expensive ETL processes are avoided. Apart from data heterogeneity, data lakes also entail hardware heterogeneity. Typical ...
2021

Network utility maximization for delay-sensitive applications in unknown communication settings

Stefano D'Aronco

In the last decades the Internet traffic has greatly evolved. The advent of new Internet services and applications has, in fact, led to a significant growth of the amount of data transmitted, as well as to a transformation of the data type. As a matter of ...
EPFL2018
Show more
Related concepts (2)
Transport layer
In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end communication services for applications. It provides services such as connection-oriented communication, reliability, flow control, and multiplexing. The details of implementation and semantics of the transport layer of the Internet protocol suite, which is the foundation of the Internet, and the OSI model of general networking are different.
Network packet
In telecommunications and computer networking, a network packet is a formatted unit of data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the payload. Control information provides data for delivering the payload (e.g., source and destination network addresses, error detection codes, or sequencing information). Typically, control information is found in packet headers and trailers.