Amir Roshan Zamir, Roman Christian Bachmann, Andrei Atanov, David Mizrahi
We propose a pre-training strategy called Multi-modal Multi-task Masked Autoencoders (MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can optionally accept additional modalities of information in the input besides the RGB ...
SPRINGER INTERNATIONAL PUBLISHING AG2022