Lecture

Handling Text Data: Document Retrieval and Classification

In course
DEMO: laboris ea in pariatur
Ex eiusmod nulla aliqua sit labore enim consequat reprehenderit cupidatat nostrud. Ut reprehenderit nisi in dolor occaecat magna tempor cillum elit consectetur irure ex Lorem. Nulla do sint eu Lorem tempor commodo labore dolore velit in occaecat. Veniam Lorem non nisi elit velit elit exercitation minim aliquip labore culpa. Ea adipisicing aute enim et culpa ea excepteur esse voluptate do voluptate veniam amet.
Login to see this section
Description

This lecture covers the handling of text data, focusing on document retrieval and classification. Topics include typical tasks like sentiment analysis and topic detection, the use of TF-IDF matrices, and the challenges of sparsity in text data. The instructor introduces the concept of bag-of-words and discusses the application of matrix factorization techniques. The lecture also delves into the use of contextualized word vectors, such as BERT, for more advanced natural language processing tasks. The NLP pipeline, from tokenization to coreference resolution, is explained, along with the importance of contextualized word vectors in modern NLP models.

Instructor
sint dolor ea
Non ut ex ut proident nulla cupidatat ipsum laboris fugiat. Proident laboris tempor reprehenderit exercitation dolore aliquip veniam in ut proident sit labore minim. Duis nostrud proident est veniam ut mollit. Lorem laborum deserunt duis mollit nostrud culpa enim. Cillum ullamco cupidatat est laborum. Sunt sunt laborum ex sint deserunt sit fugiat.
Login to see this section
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (42)
Handling Text: Document Retrieval, Classification, Sentiment Analysis
Explores document retrieval, classification, sentiment analysis, TF-IDF matrices, nearest-neighbor methods, matrix factorization, regularization, LDA, contextualized word vectors, and BERT.
Document Retrieval and Classification
Covers document retrieval, classification, sentiment analysis, and topic detection using TF-IDF matrices and contextualized word vectors like BERT.
Text Handling: Matrix, Documents, Topics
Explores text handling, focusing on matrices, documents, and topics, including challenges in document classification and advanced models like BERT.
Vector Space Semantics (and Information Retrieval)
Explores the Vector Space model, Bag of Words, tf-idf, cosine similarity, Okapi BM25, and Precision and Recall in Information Retrieval.
Latent Semantic Indexing
Covers Latent Semantic Indexing, word embeddings, and the skipgram model with negative sampling.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.