Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers advanced techniques in data preprocessing, including handling categorical data encoding, missing data, and unbalanced datasets. It explains methods such as one hot encoding, replacing missing values with mean or regression, and down-sampling/oversampling for unbalanced datasets. The instructor emphasizes the importance of performance metrics and provides insights on expectation maximization for missing values. The lecture also discusses the use of confusion matrices for unbalanced datasets and compares classifiers' performance. Supplementary material on dataset selection and clustering is briefly mentioned.