Publication

Fingerprinting Big Data: The Case of KNN Graph Construction

Related concepts (25)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Data cleansing

Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting or a data quality firewall. After cleansing, a data set should be consistent with other similar data sets in the system.

Data warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

Petersen graph

In the mathematical field of graph theory, the Petersen graph is an undirected graph with 10 vertices and 15 edges. It is a small graph that serves as a useful example and counterexample for many problems in graph theory. The Petersen graph is named after Julius Petersen, who in 1898 constructed it to be the smallest bridgeless cubic graph with no three-edge-coloring. Although the graph is generally credited to Petersen, it had in fact first appeared 12 years earlier, in a paper by .

Data

In common usage and statistics, data (USˈdætə; UKˈdeɪtə) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures.

Complement graph

In the mathematical field of graph theory, the complement or inverse of a graph G is a graph H on the same vertices such that two distinct vertices of H are adjacent if and only if they are not adjacent in G. That is, to generate the complement of a graph, one fills in all the missing edges required to form a complete graph, and removes all the edges that were previously there. The complement is not the set complement of the graph; only the edges are complemented. Let G = (V, E) be a simple graph and let K consist of all 2-element subsets of V.

Hearing loss

Hearing loss is a partial or total inability to hear. Hearing loss may be present at birth or acquired at any time afterwards. Hearing loss may occur in one or both ears. In children, hearing problems can affect the ability to acquire spoken language, and in adults it can create difficulties with social interaction and at work. Hearing loss can be temporary or permanent. Hearing loss related to age usually affects both ears and is due to cochlear hair cell loss. In some people, particularly older people, hearing loss can result in loneliness.

Sensorineural hearing loss

Sensorineural hearing loss (SNHL) is a type of hearing loss in which the root cause lies in the inner ear or sensory organ (cochlea and associated structures) or the vestibulocochlear nerve (cranial nerve VIII). SNHL accounts for about 90% of reported hearing loss . SNHL is usually permanent and can be mild, moderate, severe, profound, or total. Various other descriptors can be used depending on the shape of the audiogram, such as high frequency, low frequency, U-shaped, notched, peaked, or flat.

Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.

Locality-sensitive hashing

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized.

K-d tree

In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches) and creating point clouds. k-d trees are a special case of binary space partitioning trees. The k-d tree is a binary tree in which every node is a k-dimensional point.