Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the task of document classification, where a classifier is constructed to assign labels to unlabeled documents based on a training set. Topics include document features like bag of words, phrases, and word fragments, dealing with high dimensionality, classification algorithms like k-Nearest-Neighbors and Naïve Bayes, and the use of word embeddings for classification. The instructor also discusses the challenges in document classification, such as vocabulary size and feature selection. Various classification methods and their characteristics are explored, including the Naïve Bayes classifier and Fasttext. The lecture concludes with a summary of document classification methods and their applications.