Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture delves into the analysis of genomic data, focusing on clustering methods applied to cancer gene expression experiments. It covers pre-processing steps, filtering genes based on expression values, and determining the number of clusters. The instructor discusses various clustering algorithms, distance measures, and the impact of different variables on clustering results. Additionally, the lecture explores the association of variables with clusters and survival times, using Kaplan-Meier survival curves. The identification of genes associated with survival is highlighted, along with the statistical significance of Cox model coefficients. The limitations of single gene tests and the importance of careful follow-up in data analysis are also addressed, emphasizing the need for caution in interpreting cluster analysis results.