Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the fundamentals of cluster analysis in the context of genomic data analysis, focusing on methods such as classification, clustering gene expression data, visualization techniques, similarity and dissimilarity measures, distance metrics, and various clustering algorithms. The instructor explains the challenges in defining clusters, the importance of choosing the right distance measure, and the process of hierarchical and partitioning clustering. Practical examples and tools like R packages for clustering tasks are also discussed, along with criteria for estimating the number of clusters and assessing cluster assignments' confidence and homogeneity.