Karl Aberer, Rémi Philippe Lebret, Negar Foroutan Eghlidi
Vision-Language Pre-training (VLP) has advanced the performance of many visionlanguage tasks, such as image-text retrieval, visual entailment, and visual reasoning. The pre-training mostly utilizes lexical databases and image queries in English. Previous w ...
Assoc Computational Linguistics-Acl2023