Lecture

Data Annotation: Collection and Biases in NLP

Description

This lecture focuses on the critical aspects of data collection, annotation, and the biases that can arise in natural language processing (NLP). It begins with a recap of fine-tuning techniques and transitions into the importance of data annotation, highlighting the processes involved and the potential biases that can affect model performance. The instructor discusses the significance of benchmarks in evaluating model performance, emphasizing that benchmarks are often constructed from human-created datasets, which can introduce flaws. The lecture outlines the steps involved in building effective benchmarks, including defining tasks, designing annotation guidelines, and ensuring data quality. The discussion also covers the implications of biases, such as spurious correlations and annotation artifacts, which can lead to models learning shortcuts rather than true understanding. The session concludes with a reflection on the necessity of high-quality data for training robust NLP models and the ongoing challenges in creating reliable evaluation metrics.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.