Lecture

State Space Models: Expressivity and Transformers

Description

This lecture discusses state space models and their expressivity in comparison to transformers. It begins by defining state-based models and their capabilities, particularly focusing on attention mechanisms and their computational costs. The instructor highlights the quadratic nature of attention and explores the potential for sub-quadratic runtimes through innovative algorithms. The lecture then delves into the architecture of transformers, explaining how they process sequences of tokens and generate updates through attention mechanisms. The concept of filtering and convolution is introduced, emphasizing the importance of implicit filters in state-based models. The instructor also discusses the mathematical foundations of state updates and the role of various matrices in the process. The lecture concludes with a discussion on the latest advancements in state space models, particularly the Hayena architecture, which utilizes Fourier transforms for efficient computation. Overall, the lecture provides a comprehensive overview of the relationship between state space models and transformers, highlighting their respective strengths and weaknesses in various applications.

Official source

https://mediaspace.epfl.ch/media/0_snc236tx

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

State Space Models: Expressivity and Transformers

Graph Chatbot

Chat with Graph Search