This lecture discusses state space models and expressivity results related to transformers. The instructor begins by explaining the necessity of sufficient state storage in state space models for copying sequences. The proof highlights that the output of these models is state-dependent, requiring prior information for accurate copying. The lecture then transitions to transformers, focusing on a theorem regarding their expressivity. The instructor elaborates on the concept of attention heads, which are crucial for the transformer architecture. The discussion includes how transformers can copy sequences exponentially based on the number of attention heads. The instructor introduces an n-gram copying algorithm, explaining how it utilizes a hash table to map n-grams to their subsequent tokens. The lecture concludes with an exploration of the relationship between the size of the hash table and the input sequence, emphasizing the efficiency of transformers in learning and implementing this copying mechanism. Overall, the lecture provides insights into the theoretical underpinnings of transformers and their practical applications in sequence copying tasks.