This lecture focuses on the integration of visual perception and robotic actions within the context of embodied AI. It begins with an overview of the architecture of convolutional neural networks (CNNs) used in perceptual robotics, highlighting the roles of different GPUs in processing visual data. The instructor discusses the relationship between visual perception and an agent's actions, emphasizing how ecological factors influence design choices in robotics. Key concepts such as embodied AI, multimodal learning, and the importance of perceptual priors are introduced. The lecture also covers various robotic agents and their capabilities, including target navigation tasks. The instructor illustrates how simple mechanisms can lead to complex behaviors in robots, using examples like the BristleBot. The discussion extends to the significance of pre-training visual representations to enhance learning efficiency and generalization in robotic tasks. Finally, the lecture outlines standardized tasks in embodied vision, including visual navigation and rearrangement, setting the stage for practical applications in the course project.