Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
This paper introduces TACOSS a text-image alignment approach that allows explainable land cover semantic segmentation by directly integrating semantic concepts encoded from texts. TACOSS combines convolutional neural networks for visual feature extraction with semantic embeddings provided by a language model. By leveraging contrastive learning approaches, we learn an alignment between the visual and the (fixed) textual representations. In addition to producing standard semantic segmentation outputs, our model enables interactive queries with RS images using natural language prompts. The experimental results obtained on 50cm resolution aerial data from Switzerland show that TACOSS performs similarly to a standard semantic segmentation model while allowing the flexible usage of in- and out-of-vocabulary terms for the interactions with the image.