This lecture covers the application of computer vision techniques in embodied artificial intelligence, focusing on tasks like object navigation and video generation from single images. Various models and approaches are discussed, including the use of transformers and graph neural networks. The lecture also presents results from experiments on datasets like Cityscapes and Syn2Real, highlighting the importance of robustness and precision in visual tasks.
This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.
Watch on Mediaspace