Better Generic Objects Counting When Asking Questions to Images: A Multitask approach for Remote Sensing Visual Question Answering

Devis Tuia, Benjamin Alexander Kellenberger, Sylvain Lobry
2020
Article de conférence

Résumé

Visual Question Answering for Remote Sensing (RSVQA) aims at extracting information from remote sensing images through queries formulated in natural language. Since the answer to the query is also provided in natural language, the system is accessible to non-experts, and therefore dramatically increases the value of remote sensing images as a source of information, for example for journalism purposes or interactive land planning. Ideally, an RSVQA system should be able to provide an answer to questions that vary both in terms of topic (presence, localization, counting) and image content. However, aiming at such flexibility generates problems related to the variability of the possible answers. A striking example is counting, where the number of objects present in a remote sensing image can vary by multiple orders of magnitude, depending on both the scene and type of objects. This represents a challenge for traditional Visual Question Answering (VQA) methods, which either become intractable or result in an accuracy loss, as the number of possible answers has to be limited. To this end, we introduce a new model that jointly solves a classification problem (which is the most common approach in VQA) and a regression problem (to answer numerical questions more precisely). An evaluation of this method on the RSVQA dataset shows that this finer numerical output comes at the cost of a small loss of performance on non-numerical questions.

Source officielle

https://infoscience.epfl.ch/record/283353?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Better Generic Objects Counting When Asking Questions to Images: A Multitask approach for Remote Sensing Visual Question Answering

Graph Chatbot

Chattez avec Graph Search

Multi-task prompt-RSVQA to explicitly count objects on aerial images

Distributional Regression and Autoregression via Optimal Transport

Bayes-optimal Learning of Deep Random Networks of Extensive-width

Multi-task prompt-RSVQA to explicitly count objects on aerial images

Distributional Regression and Autoregression via Optimal Transport

Bayes-optimal Learning of Deep Random Networks of Extensive-width