Lecture

Deep Learning: Exploring Vision and Language Transformers

Description

This lecture focuses on advanced transformer architectures in deep learning, specifically the Swin transformer, HUBERT, and Flamingo. The instructor begins by recapping previous topics, including vision and audio transformers, and their applications in multimodal inputs. The lecture emphasizes the importance of understanding how these models can be utilized for various data types, such as images and text. The Swin transformer is introduced as an efficient model that addresses the challenges of scale in images, while HUBERT is discussed for its capabilities in speech representation learning. The Flamingo architecture is highlighted for its innovative approach to interleaving visual and textual data, enabling complex interactions. The instructor encourages students to apply these concepts to their mini-projects, emphasizing the significance of practical implementation and experimentation. Throughout the lecture, the instructor engages with students, addressing their questions and providing insights into the future of deep learning and its societal implications.

Login to watch the video

Instructor

enim et elit

Irure sint velit amet eu aliquip culpa sunt velit duis excepteur ad ullamco. Qui cillum mollit consequat sunt commodo voluptate dolore mollit proident nisi anim laborum aliquip. Elit mollit elit do incididunt ut deserunt.

Official source

https://mediaspace.epfl.ch/media/0_qh62u8fz

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Deep Learning: Exploring Vision and Language Transformers

Graph Chatbot

Chat with Graph Search