Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers Transformer networks and self-attention layers, explaining how they map sets of inputs and the concept of multi-head attention. It delves into the process of learning weights, the importance of positional encoding, and the interpretability of the heads.