Summary
In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has become ubiquitous. Anything transmitted over the Internet is transmitted as a data stream. Using a mobile phone to have a conversation transmits the sound as a data stream. In a formal way, a data stream is any ordered pair where: is a sequence of tuples and is a sequence of positive real time intervals. Data Stream contains different sets of data, that depend on the chosen data format. Attributes – each attribute of the data stream represents a certain type of data, e.g. segment / data point ID, timestamp, geodata. Timestamp attribute helps to identify when an event occurred. Subject ID is an encoded-by-algorithm ID, that has been extracted out of a cookie. Raw Data includes information straight from the data provider without being processed by an algorithm nor human. Processed Data is a data that has been prepared (somehow modified, validated or cleaned), to be used for future actions. There are various areas where data streams are used: Fraud detection & scoring – raw data is used as source data for an anti-fraud algorithm (data analysis techniques for fraud detection). For example, timestamps, cookie occurrences or analysis of data points are used within the scoring system to detect fraud or to make sure that a message receiver is not a bot (so-called Non-Human Traffic). Artificial intelligence – raw data is treated like a train set and a test set during AI and machine learning algorithms building. Raw data is used for profiling and personalization to customize user profiles and divide them for segmentation, e.g., per gender or location (based on data point). Business intelligence – raw data is a source of information for BI systems, used for enriching user profiles with detailed information about them, e.g., purchase path or geodata. This information is used for business analysis and predictive research.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (1)
CS-422: Database systems
This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students