Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture by the instructor covers the dangers of 'big' models in statistics for data science, focusing on multicollinearity issues and model fit analysis. The presentation discusses the impact of adding variables to a model, the concept of multicollinearity, and how it can lead to inflated variances and unreliable coefficient estimates. Remedies such as variable deletion and orthogonal basis selection are explored, along with diagnostic tools like variance inflation factors and condition indices. Practical examples from body fat data are used to illustrate these concepts.