Lecture

Transformers in Vision: Applications and Architectures

Description

This lecture discusses the transformative impact of transformers in various fields, particularly in computer vision. It begins with an overview of transformers, highlighting their unifying role across different machine learning domains, such as natural language processing and speech recognition. The instructor reviews the foundational paper 'Attention Is All You Need' and explains the architecture of transformers, including the encoder-decoder structure. The lecture emphasizes the effectiveness of transformer-based models in image classification and semantic segmentation, showcasing recent advancements and leaderboards. The discussion extends to the applications of transformers in visual perception, including embodied AI and static vision tasks. The instructor also covers the importance of tokenization and positional encoding in processing different data types, such as text and images. The lecture concludes with insights into the future of transformers in vision, including their scalability and potential for further innovations in the field.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.