Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Despite their irresistible success, deep learning algorithms still heavily rely on annotated data, and unsupervised settings pose many challenges, such as finding the right inductive bias in diverse scenarios. In this paper, we propose an object-centric model for image sequence representation that uses the prediction task for self-supervision. By disentangling object representation and motion dynamics, our novel compositional structure explicitly handles occlusion and inpaints inferred objects and background for the composition of the predicted frame. Using auxiliary losses to promote spatially and temporally consistent object representations, we train our self-supervised framework without the help of any annotation or pretrained network. Initial experiments confirm that our new pipeline is a promising step towards object-centric video prediction.
Mathieu Salzmann, Delphine Ribes Lemay, Nicolas Henchoz, Romain Simon Collaud, Syed Talal Wasim
Devis Tuia, Gaston Jean Lenczner, Thiên-Anh Claris Nguyen, Marc Conrad Russwurm