**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Kernel K-Means Method

Description

This lecture covers the kernel k-means method, which aims to avoid suboptimal solutions by initializing centroids to maximize their dispersion among the data. It introduces the concept of kernels to describe data in non-Euclidean spaces, allowing the formation of non-convex clusters. The lecture explains the derivation of the kernel k-means algorithm, highlighting the calculation of distances between observations and centroids. It also discusses the application of support vector machines (SVM) in non-linear problems through data redescription in Hilbert spaces. Additionally, the lecture explores clustering by density, emphasizing the identification of dense regions in datasets without predefined labels.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

Instructor

Related concepts (34)

EE-311: Fundamentals of machine learning

Ce cours présente une vue générale des techniques d'apprentissage automatique, passant en revue les algorithmes, le formalisme théorique et les protocoles expérimentaux.

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Single-linkage clustering

In statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other. This method tends to produce long thin clusters in which nearby elements of the same cluster have small distances, but elements at opposite ends of a cluster may be much farther from each other than two elements of other clusters.

Reproducing kernel Hilbert space

In functional analysis (a branch of mathematics), a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions and in the RKHS are close in norm, i.e., is small, then and are also pointwise close, i.e., is small for all . The converse does not need to be true. Informally, this can be shown by looking at the supremum norm: the sequence of functions converges pointwise, but does not converge uniformly i.

Related lectures (42)

Clustering: k-meansPHYS-467: Machine learning for physicists

Explains k-means clustering, assigning data points to clusters based on proximity and minimizing squared distances within clusters.

Unsupervised Learning: Clustering MethodsCS-401: Applied data analysis

Covers unsupervised learning focusing on clustering methods and the challenges faced in clustering algorithms like K-means and DBSCAN.

Clustering Methods

Covers K-means, hierarchical, and DBSCAN clustering methods with practical examples.

Clustering & Density EstimationDH-406: Machine learning for DH

Covers dimensionality reduction, PCA, clustering techniques, and density estimation methods.

Predicting Rainfall: Miniproject BIO-322

Introduces a miniproject where students predict rainfall in Pully using machine learning, focusing on reproducibility and code quality.