This lecture covers the evolution of Machine Translation from Statistical Machine Translation to Neural Machine Translation (NMT) with a focus on Sequence-to-Sequence models and Attention mechanisms. It explains the challenges faced in modeling translation, the architecture of NMT models, the role of Attention in NMT, and the benefits of using Attention instead of Recurrence. The lecture also delves into the concepts of self-attention, position representations, nonlinearities, and masking in the context of building blocks for NMT systems.