**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Dimensionality Reduction: PCA and LDA

Description

This lecture covers the concepts of dimensionality reduction through techniques like Principal Component Analysis (PCA) and Fisher Linear Discriminant Analysis (LDA). It explains how PCA aims to retain important data signal while removing noise by maximizing variance, and how LDA focuses on clustering samples from the same class and separating different classes. The lecture also introduces Kernel PCA for nonlinear data, t-SNE for visualization, and discusses clustering methods like K-means. It delves into Gaussian Mixture Models (GMM) for density estimation, KDE for smooth distribution estimates, and Mean Shift for clustering by finding density maxima. The presentation concludes with a comparison of KDE and histograms for data representation.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

Instructor

Related concepts (208)

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.

Kernel density estimation

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form.

Data and information visualization

Data and information visualization (data viz or info viz) is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items.

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Related lectures (1,000)

Clustering & Density EstimationDH-406: Machine learning for DH

Covers dimensionality reduction, PCA, clustering techniques, and density estimation methods.

Clustering & Density EstimationDH-406: Machine learning for DH

Covers clustering, PCA, LDA, K-means, GMM, KDE, and Mean Shift algorithms for density estimation and clustering.

Clustering & Density EstimationDH-406: Machine learning for DH

Covers dimensionality reduction, clustering, and density estimation techniques, including PCA, K-means, GMM, and Mean Shift.

Document Analysis: Topic ModelingDH-406: Machine learning for DH

Explores document analysis, topic modeling, and generative models for data generation in machine learning.

Clustering: k-meansPHYS-467: Machine learning for physicists

Explains k-means clustering, assigning data points to clusters based on proximity and minimizing squared distances within clusters.