Summary
In database management, an aggregate function or aggregation function is a function where the values of multiple rows are processed together to form a single summary value. Common aggregate functions include: Average (i.e., arithmetic mean) Count Maximum Median Minimum Mode Range Sum Others include: Nanmean (mean ignoring NaN values, also known as "nil" or "null") Stddev Formally, an aggregate function takes as input a set, a multiset (bag), or a list from some input domain I and outputs an element of an output domain O. The input and output domains may be the same, such as for SUM, or may be different, such as for COUNT. Aggregate functions occur commonly in numerous programming languages, in spreadsheets, and in relational algebra. The listagg function, as defined in the SQL:2016 standard aggregates data from multiple rows into a single concatenated string. In the entity relationship diagram, aggregation is represented as seen in Figure 1 with a rectangle around the relationship and its entities to indicate that it is being treated as an aggregate entity. Aggregate functions present a bottleneck, because they potentially require having all input values at once. In distributed computing, it is desirable to divide such computations into smaller pieces, and distribute the work, usually computing in parallel, via a divide and conquer algorithm. Some aggregate functions can be computed by computing the aggregate for subsets, and then aggregating these aggregates; examples include COUNT, MAX, MIN, and SUM. In other cases the aggregate can be computed by computing auxiliary numbers for subsets, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases the aggregate cannot be computed without analyzing the entire set at once, though in some cases approximations can be distributed; examples include DISTINCT COUNT (Count-distinct problem), MEDIAN, and MODE.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (1)
CS-422: Database systems
This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students
Related lectures (9)
OLAP Queries: Aggregations, Cube Operator, and Optimizations
Explores OLAP queries, cube operator, super-aggregates, and optimization techniques.
OLAP Queries and Optimizations
Covers OLAP queries, examples of comparisons, ranking, and optimizations for query performance.
Distributed Query Processing: Execution Models and Declustering Tradeoffs
Covers analytical query processing, declustering strategies, and distributed operations.
Show more
Related publications (8)
Related concepts (3)
Pivot table
A pivot table is a table of grouped values that aggregates the individual items of a more extensive table (such as from a database, spreadsheet, or business intelligence program) within one or more discrete categories. This summary might include sums, averages, or other statistics, which the pivot table groups together using a chosen aggregation function applied to the grouped values. Although pivot table is a generic term, Microsoft held a trademark on the term in the United States from 1994 to 2020.
OLAP cube
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than three. A cube can be considered a multi-dimensional generalization of a two- or three-dimensional spreadsheet. For example, a company might wish to summarize financial data by product, by time-period, and by city to compare actual and budget expenses.
Online analytical processing
Online analytical processing, or OLAP (ˈoʊlæp), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.