Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture introduces the principles of clustering in machine learning, where algorithms group datapoints without knowing true labels or the number of groups. Clustering is used for feature extraction and data compression, with cluster prototypes representing typical datapoints or centroids. The lecture covers similarity measures, PCA projection, and the use of K-means and soft-K-means algorithms. It also discusses the impact of initialization on clustering performance and evaluates different clustering methods using internal and external measures. The instructor demonstrates how to determine the optimal number of clusters using RSS and explains the sensitivity of K-means to initialization.